1 Introduction

Coronary artery disease (CAD) is one of the most critical diseases; it can cause severe heart attacks in patients. Patients and cardiologists should be aware of preventing the occurrence of sudden life-threatening events. A huge quantity of clinical data should be managed so that the cardiovascular risk could be estimated more reliably to stratify patients and foster early intervention [1, 2]. According to [3] CAD occurs when atherosclerotic plaque (hardening of the arteries) builds up in the wall of the arteries that supply the heart; this plaque is primarily made of cholesterol. Plaque accumulation can be accelerated by smoking, high blood pressure, and a high level of cholesterol percentage compared with the normal level and diabetes. Patients are also at higher risk for plaque development if they are older (greater than 45 years for men and 55 years for women), or if they have a positive family history for early heart artery disease. When a blood clot forms on top of this plaque, the artery becomes blocked, causing a heart attack [3]. The total direct and indirect cost of cardiovascular disease (CVD) and stroke in the USA in 2009 is estimated to be $312.6 billion, compared with $228 billion in 2008 for all cancer and benign neoplasms according to the American Heart Association [4]. CVD costs more than any other diagnostic group. It is important to detect the cardiovascular symptoms precisely because when heart problems are detected, their underlying cause (atherosclerosis) is usually quite advanced and had been progressing for decades.

One of the possible solutions to minimize heart problems is to increase the people aware of their respective CAD risks in advance and guide them to take preventive actions accordingly [5]. As in many complex medical problems, early and accurate detection could lead to better decision making.

The gender of the CAD patient is an important issue to consider when building a CAD diagnosis model. Building separate CAD diagnosis models for male and female speeds up detection, decision making, and treatment program. The gender-based analysis affects the development process of the diagnosis model in all of its stages [6].

This paper studies the effects of the patient gender on building the CAD diagnosis model; that leads to speed up the discovery and diagnosis and support the surgeons to make the right decision at the correct time.

The proposed CAD approach aims at the development of CAD diagnosis model achieved with the following characteristics:

A small and simple model with a minimum number of features and high performance around 97% and faster warning based on fewer measurements so there is no need to wait for all measurements. The homogeneity of the dataset in each model when building two separate models one for male gender and another independent separate one for female gender; the training model for each gender will be simple due to homogeneity and small due to the small feature set; therefore, the testing will be faster. The time for learning and testing the model was less than 0.015 s for each model.

The rest of this paper is organized as follows: Sect. 2 reviews the literature. Section 3 introduces the proposed approach. Section 4 describes the experimental design. Section 5 is reserved for the analysis and discussion of the obtained results. Finally, Sect. 6 concludes the paper and proposes future extensions.

2 Literature review

The importance of developing CAD diagnosis systems and CAD decision support systems is introduced and implemented in many research works. CAD diagnosis and decision support systems, designed to assist physicians and other healthcare professionals with decision-making tasks, are called clinical decision support systems (CDSS). These systems mainly link two main components: knowledge and experts. The knowledge component of these systems may come from the experts (via expert systems) and/or it can be extracted from historical data sources via data mining. Data mining offers tools, considering decision making for a particular individual patient rather than a population of patients, resulting in the emphasis on finding out potentially useful, valid, novel and easily comprehended knowledge from data [7]. According to [3] CAD diagnosis methods include the recording of the electrical activity of the heart, as well as nuclear imaging of the blood flow to different regions of the heart. Using an external camera and ultrasound imaging of the heart muscle with exercise stress testing (stress echocardiography) is also a very accurate technique to detect CAD. The American Heart Association has identified several risk factors; some of them can be modified, treated or controlled, and some cannot; the more risk factors you have, the greater your chance of developing coronary heart disease [8].

Several studies have applied several techniques in collecting datasets from patients while using different data mining algorithms to develop data mining models. The work presented in [9] used a feature creation method to enrich the dataset and applied the information gain and confidence to determine the effectiveness of features on CAD [9]. It reported that chest pain, region RWMA2, and age were the most effective ones with high accuracy. Particle swarm optimization (PSO)-based fuzzy expert system for the diagnosis of coronary artery disease was proposed in [10]. It applied a decision tree (DT) classifier and converted its output into crisp if–then rules and transformed it into the fuzzy rule base. PSO was employed to tune the fuzzy membership functions (MFs). The work presented in [10] introduced a particle swarm optimization (PSO)-based fuzzy expert system to diagnose CAD. It applied a decision tree (DT) classifier and converted its output into crisp if–then rules and transformed it into the fuzzy rule base. PSO was employed to tune the fuzzy membership functions (MFs). A framework for intelligent medical diagnosis using the rough sets with formal concept analysis fuzzy set was developed in [11]. A fuzzy expert system for coronary heart disease diagnosis in Jordan was developed in [12]. In 2013, a newly provided dataset for heart disease, named as Z-Alizadehsani dataset, was published [13]. The collected databank comprises the information of 303 patients. This databank has 55 independent parameters and classifies a person into a normal or CAD class. They utilized several machine learning methods such as SMO, ANN, NB and bagging algorithms to study the Z-Alizadehsani dataset; moreover, they used feature creation and feature selection methods to evaluate the obtained results and concluded that the performances can be improved with utilizing the feature creation and feature selection techniques.

A computer-aided decision making for heart disease detection using a hybrid neural network–genetic algorithm was implemented in [14]. Using decision tree algorithms and data mining approaches for proving that HS-CRP is strongly associated with coronary heart disease (CHD) was implemented in [15]. The use of computational intelligence methods to detect CAD was introduced in [16]. The benchmarking of feature selection techniques for coronary artery CAD diagnosis was proposed in [17]. A cardiovascular risk prediction method based on text analysis and data mining ensemble system was developed in [18]. A CAD diagnosis model using supervised fuzzy c-means with differential search algorithm was presented in [19]. A fuzzy rule generation for diagnosis of coronary heart disease risk using a subtractive clustering method was presented in [20]. The use of fuzzy c-means clustering employed for predicting heart disease symptoms was presented in [21]. The use of fuzzy classification for obtaining an enhanced risk prediction system for cardiovascular disease in India was introduced in [22]. Data mining with decision trees for the assessment of the risk factors of coronary heart events was developed in [23].

Alizadehsani et al. [24] divided the dataset into the training 90% and test 10% datasets. They used the information gain and SVM methods for feature analysis and feature selection. They employed the SVM methodology in combination with RBF, sigmoid, linear, and polynomial for building classifier.

A model of an association rule discovery with fuzzy decreasing support on syndrome differentiation and medication in coronary heart disease was presented in [24]. The use of cost-sensitive algorithms for a diagnosis of coronary artery disease was implemented in [25], and a comparative study of medical data classification methods based on decision tree and bagging algorithms was presented. According to medical experts, early detection may prevent death due to CAD if the proper medication is given thereafter as presented in [26]. Considering the gender difference of the patient when analyzing and diagnosing the patients’ data and the behavior of the CAD and its progress and causes affects the diagnosis and treatment program, as stated in [6]. Gender differences in patients include biological, environmental, behavioral and psychological risk factors. A methodology for the automatic detection of normal and coronary artery disease conditions using heart rate signals is presented in [27] using principal component analysis (PCA) and different analysis algorithms. The work presented in [28] selected the classifier with high performance among different types of classifiers such as logistic regression (LR), classification and regression tree (CART), Multilayer Perceptron (MLP), radial basis function (RBF) and self-organizing feature maps (SOFM); eight predefined attributes were used (age, sex, family history of CAD, smoking status, diabetes mellitus, systemic hypertension, hypercholesterolemia, and body mass index (BMI)), and the major drawback of this work is its low performance. The work presented in [29] applied traditional machine learning algorithms with three types of SVM, the performance enhancement based on data preprocessing of attributes normalization and utilized the genetic algorithm and particle swarm optimization of classifier parameters selection and also introduced a new genetic training, which provided the accuracy of 93.08%.

Ghiasi et al. [30] introduced a tree-based classifier for building a CAD diagnosis approach employing the Z-Alizadehsani dataset. Another work utilized the ANN and GA techniques; Arabasadi et al. [15] classified the Z-Alizadehsani dataset in 2017. The obtained performance was represented by magnitudes of sensitivity, specificity, and accuracy for the GA-ANN which were 97%, 92%, and 93.85%, respectively. On the other hand, the ANN model classified the dataset with 84.62% accuracy, 86% sensitivity, and 83% specificity. The classification capability of the hybrid model of GA-ANN was considerably higher than the developed ANN model. The building of CAD classification system using Naïve Bayes, SVM, SMO, C4.5, and KNN techniques' accuracy levels was also evaluated in a work introduced by Alizadehsani et al. [31]. Alizadehsani et al. [32] proposed a data mining model for CAD classification to detect left circumflex, left anterior descending, and right coronary artery, which improved the classification accuracy. Acharya et al. [33] introduced a comparative study of the performance of three techniques named as discrete wavelet transform (DWT), empirical mode decomposition (EMD), and discrete cosine transform (DCT) in the detection of CAD. Alkeshoush et al. [34] studied the importance of the particle swarm optimization (PSO) algorithm in the diagnosis of heart disease. Steele et al. [35] discovered that machine learning techniques outperform in electronic health records than conventional survival methods for predicting patient mortality in CAD. Johnson et al. [36] employed machine learning techniques for the scoring of CAD characteristics on coronary CT angiograms. In 2019, Alizadehsani et al. [37] conducted a review of machine learning techniques for CAD prediction.

3 Proposed approach

The work in this paper studies and presents the effect of separating the dataset of the CAD patients into two independent datasets for male and female patients and builds two separate diagnosis models for CAD patients based on their gender. The workflow of the proposed approach is comprised of four phases as shown in Fig. 1.

Fig. 1
figure 1

Overall structure of the proposed approach for building separate diagnosis models for male and female

The first phase comprises data segmentation into male and female datasets (segments). The second phase includes the preprocessing for missed values, features discretization, binarization, and features selection processes to extract the most important features in each dataset (using the features ranks voting named FRV [38]. The third phase is the building of 38 different classifiers from seven different classifier categories for female and male datasets separately to select the most suitable classifier model for each gender. The last phase applies the classification via clustering technique to increase accuracy, sensitivity, and specificity. The overall structure of the proposed approach of building gender-based diagnosis models for males and females is illustrated in Fig. 1.

More than 50 different data mining methods were applied in implementing the proposed diagnosis system. The following steps represent the different phases and the structure of the two proposed diagnosis models:

  1. 1.

    The preprocessing methods which solve the problem of missed values, and feature evaluation algorithms (CfsSubsetEval, GainRatioAttributeEval, CorrelationAttributeEval, GainRatioAttributeEval, InfoGain-AttributeEval, OneRAttributeEval, PrincipalComponents, SymmetricalUncertAttributeEval),

  2. 2.

    The features selection algorithm FRV (Feature Rank voting Algorithm), features binarization and discretization processes (NominalToBinary Conversion, attributes discretization, ClusterMembership methods)

  3. 3.

    Classification methods which are grouped into seven groups as:

    • Bayes Net-based classifiers,

    • Functions-based classifiers,

    • Misc,

    • Lazy includes,

    • Meta-based classifiers (optimization techniques).

    • Rule-based classifiers,

    • Tree-based classifiers.

  4. 4.

    The classification via clustering to increase the accuracy of the diagnosis model.

Some stages of the above preprocessing methods such as the feature evaluation methods and classifiers used the Weka software which is implemented [39] and developed in software programs.

4 Experimental design

4.1 Dataset description and preprocessing

The proposed approach in this paper utilized the dataset in [40] with its description presented on the UCI machine learning repository and named as “Heart Disease dataset and Z-Alizadehsani dataset.” This dataset is comprised of 270 patients’ records, each with 75 attributes.

This dataset is divided into two segments based on the gender of the patient as male data segment (183 records with 74 attributes instead of 75 by excluding the gender feature) and female data segment (87 instances with 74 attributes instead of 75 by excluding the gender feature). The Z-Alizadehsani dataset is comprised of 303 CAD patients’ records, each with 56 attributes, categorized into four categories; demographic features set (seven attributes), symptoms and examination feature set (16 attributes), ECG features set (14 attributes), and laboratory feature set (18 attributes), as shown in Tables 1, 2, 3, and 4, respectively. The total number of attributes is 56 including the class attributes.

Table 1 Demographic features
Table 2 Symptoms and examination features
Table 3 ECG features
Table 4 Laboratory and echo features

The first step in the preprocessing phase is the dataset segmentation into two segments based on the gender of the patient as male data segment (176 records with 55 attributes) and female data segment (127 instances with 55 attributes). The second step in the preprocessing phase is the elimination of the effect of the missed values.

4.2 The features selection

The feature selection process affects the structure and the performance of the data mining models of medical diagnosis systems. Selecting the most important set of attributes and removing noisy or redundant attributes enhance the system performance, simplify the data mining model structure and reduce errors. We applied nine different feature selection algorithms to select the best set of features as shown in Table 7. The outputs of these nine algorithms will be the inputs of the FRV. Features selection/ranking algorithms were implemented in the following two steps.

4.2.1 Using five features ranking algorithms

The first set of experiments was carried out to test the performance of the FRV algorithm for each data segment (male and female) using five features ranking algorithms. As shown in Table 5A–C, each feature ranking algorithm produces different ranks for each feature in the data segment, which means that the behavior of each ranking algorithm is affected by the gender of the patient.

Table 5 Attributes ranking using five attributes

The different features ranking levels obtained from the InfoGain algorithm for the total data, the male dataset, and the female dataset are presented in Table 6 and represented graphically by the Venn diagram shown in Fig. 2, which interpret the common and different features' ranking levels as follows:

Table 6 Ranking order for each segment
Fig. 2
figure 2

Behavior of information gain ranking algorithm for female and male segments

  • The common features with the same ranking levels are f24, f28 at rank order 1 & 6, which are different completely for the total dataset (second column in Table 6) compared with male dataset in the third column and the female dataset in the fourth column.

  • The common features with different ranking levels are f1, f53, f27, f6, f52, f17.

  • The selected attributes for males only are f34, f44, f5, f33, f46.

  • The selected features for females only are f40, f38, f7, f36, f31.

The rest of the results shown in Tables 7 and 8 can be interpreted in the same way. According to these results, we can conclude that the separation of female and male diagnosis models is a crucial and mandatory request.

Table 7 Performance comparison for the nine different feature selection algorithms using different datasets
Table 8 Performance comparison using FRV and CfsSubsetEval feature selection algorithms
  1. a.

    The partitioning of the dataset into female and male data segments divides the problem into two small subproblems and removes the attribute named sex, thus reducing the number of attributes from 56 to 55.

  2. b.

    An important question is “what is the optimal number of attributes to be selected in the CAD diagnosis system?”. Several research works had agreed that 14 attributes are fair enough for CAD. Fourteen or fewer attributes are considered in this work.

  3. c.

    The experiments through this section are conducted using:

  4. The total dataset (303 instances and 56 attributes),

  5. The female data segment (127 instances and 55 attributes), and

  6. The male data segment (176 instances and 55 attributes).

4.2.2 Using nine features ranking algorithms

The second set of experiments in the feature selection process used nine different features ranking algorithms and was applied on the total dataset before the segmentation into male and female partitions to select the best set of features. The goal of this step to illustrate the importance of building separate diagnosis models based on gender.

The obtained results shown in Table 7 can be interpreted as follows: Each column represents the results obtained by applying one of the feature selection algorithms to the total dataset (before partitioning). For example, the first column represents the 14 features with the highest ranks obtained using CfsSubsetEval algorithms (age, DM, HTN, BP, typical chest pain, atypical, nonanginal, Q wave, T inversion, ESR, EF-TTEK, region RWMA, Cath accuracy). The accuracy of the classifier using this set was 84.5%. The highest performance was 85.8%, obtained using the Symmetrical UncertAttributeEval algorithm, which is considered a very low performance. Table 8 shows the results of applying the FRV algorithm on the total dataset, male dataset, and female dataset. The 14 features with the highest ranks are fed to the backpropagation neural network classifier, and the different performances were measured. The obtained results in Table 8 can be interpreted as follows: Each column represents the results obtained by one of the feature selection algorithms using a specific dataset, and the following examples explain the obtained results:

  1. a.

    The first column represents the results obtained from CfsSubsetEval algorithm using the total dataset, the accuracy was 84.5%, and the best features were age, DM, HTN, BP, typical chest pain, atypical, nonanginal, Q wave, T inversion, ESR, EF-TTEK, region RWMA, and Cath).

  2. b.

    The best performance for the female segment was 90.55%, and the number of features was 13 attributes using the FRV algorithm.

The best performance for the male segment was 88.3%, and the number of features was 14 attributes using the FRV algorithm. The obtained results of the female and male segments ensure that the segmentation is necessary. We will proceed with building the CAD model based on the concept of gender difference, and the FRV algorithm provides the best results for both male and female data segments.

4.3 Classifier selection step and results interpretation

This section presents selecting the best classifier for each dataset. Over 38 classifiers were utilized to select the classifier (s) with the highest performance for the female and male models. These classifiers are categorized into seven categories as follows: Bayes (four classifiers), function (four classifiers), lazy (three classifiers), meta (15 classifiers), misc (one classifier), rules (five classifiers), and trees (seven classifiers). Selection of the best model for each gender is carried out using different experiments and is labeled as A, B, C, D, and E.

  1. A.

    Male Model (176 instances) with different sets of features as:

    1. a.

      Using the total 55 attributes without applying features selection algorithms.

    2. b.

      Using the selected 14 attributes by the FRV algorithm as obtained in Table 8 (age, HTN, FH, obesity, typical chest pain, atypical, nonanginal, ST elevation, T inversion, HDL, EF-TTE, region RWMA, VHD, and Cath).

    3. c.

      Using the selected 14 attributes from applying the CfsSubsetEval, as obtained in Table 8 (age, DM, HTN, typical chest pain, function class, nonanginal, T inversion, ESR, EF-TTE, region RWMA, and Cath).

The results of these experiments are shown in Table 9 and can be interpreted as follows: Using the different 38 classifiers for the 55 attributes, the highest performance was 85.8% achieved by the Multilayer Perceptron classifier. Using the selected 14 attributes obtained by FRV, the highest performance was 88.6% achieved by Multilayer Perceptron. Using the selected 14 attributes obtained by CfsSubsetEval, the highest performance was 85.8% achieved by Meta Classifier Attribute Select.

Table 9 Performance of 38 different classifiers using male, female, and total datasets
  1. B.

    Female model (127 instances) with different sets of features as:

  1. d.

    Using the total 55 attributes.

  2. e.

    Using the selected 14 attributes from applying CfsSubsetEval, as obtained in Table 8 (age, DM, HTN, current smoker, BP, diastolic murmur, typical chest pain, atypical, FBS, TG, EF-TTE, region RWMA, and Cath).

  3. f.

    Using the selected 13 attributes by the FRV algorithm, as obtained in Table 8 (age, BMI, HTN, current smoker, BP, typical chest pain, atypical, FBS, TG, EF-TTE, region RWMA, VHD, and Cath).

The results of Table 9 show that using the 38 different classifiers for the total 55 attributes, the highest performance was 89.8% achieved by the BayesNet classifier. Using the selected 14 attributes obtained by FRV, the highest performance was 91% achieved by BayesNet. Using the selected 14 attributes obtained by CfsSubsetEval, the highest performance was 89% achieved by Meta Classifier Attribute Select.

  1. C.

    Model for male and female using all datasets without partition

This experiment used all datasets, including both female and male instances before segmentation (303 instances with 56 attributes). The results of these experiments are presented in Table 9; for the 38 different classifiers for the total data (303 instances) with total attributes (56 Attributes), the highest performance was 88.1% achieved by the SMO classifier.

  1. D.

    Average performance for female model and male model using CfsSubsetEval

This step aims at finding the classifier which gives the best accuracy for both male and female segments using the 14 selected attributes by CfsSubsetEval. Table 9 shows that the highest average performance was 86.65% which was achieved by the Meta Random Committee Classifier.

  1. E.

    Average performance for female model and male model using FRV

This step aims at finding out the classifier which gives the best accuracy for both male and female segments using the best selected 14 attributes by FRV. According to Table 9, the highest average performance was 89.5% which was achieved by the Multilayer Perceptron classifier A.

From these sets of experiments (A through E) shown in Table 9, the highest performance was obtained using the features selected by the FRV for all datasets, and male and female segments. The results obtained by the female segment and male segment were better than those obtained by the total dataset. Accordingly, we selected the following classifiers with the highest performance for the next experiments: Naïve Bayes, Naïve Bayes Updatable, Multilayer Perceptron, SMO, Bagging Multilayer, and Bagging SMO, LMT Tree, Random Forest Tree, REP Tree.

4.4 Classification via clustering

This set of experiments aims at improving the performance of the classifiers through the clustering of the segment dataset into a set of probabilistic clusters using the EM probabilistic clustering algorithm implemented in open-source software.

The first stage is the preprocessing stage including the following steps:

  1. a.

    Converting nominal attributes to binary attributes.

  2. b.

    Attributes discretization.

  3. c.

    Membership clustering: find membership values of each attribute to a set of predefined clusters using the EM clustering on the datasets to cluster them into two main clusters and different subclusters in each cluster.

Converting the nominal attributes to binary ones, except the class attribute, is carried out using the unsupervised filters (named NominalToBinary), For example, the VHD attribute has four nominal values (mild, moderate, N and severe). These four values are converted to four binary attributes as shown in Tables 10 and 11. Table 10 shows the VHD attribute in the male segment; before binarization, there is one attribute, but after binarization, there are four attributes; each one has minimum, maximum, median, and standard deviation values as shown in Table 11, which increased the number of attributes in the male segment to 17 instead of 14 and in the female segment to 16 instead of 13 before binarization.

Table 10 VHD attribute before binarization
Table 11 VHD attribute after binarization

Because the attributes have scattered values, attributes discretization defines a set of intervals and groups the features according to those discretized intervals. Four intervals are specified for each attribute. For example, the binary version of attribute VHD became four attributes: VHD = mild, VHD = Moderate, VHD = N and VHD = Severe. The discretized version of VHD = mild attribute is shown in Tables 12, 13, 14, and 15. Another example shows how the numeric attributes are converted to discretized attributes; for example, the age attribute has the values as minimum = 30, maximum = 86, mean = 58.494, and standard deviation = 10.771.

Table 12 VHD = mild attribute after discretization
Table 13 VHD = mild attribute after discretization
Table 14 VHD = Severe attribute after discretization
Table 15 VHD = Severe attribute after discretization

The discretized age attribute is shown in Table 16, where 14 instances have the first interval values from –inf. to 44; the second interval from 44 to 58 has 80 instances, the third interval has values from 58 to 72 and represents 59 instances, and the fourth interval from 72 to inf. represents 23 instances. The inf. means maximum and – inf. means the minimum value. The discretized attributes are submitted to the EM cluster algorithm (expectation–maximization algorithm for clustering multidimensional numerical data); more details can be found in [41]. The following example illustrates how it works: The input for this step will be the discretized attributes, and the outputs will be the clusters and subclusters and the membership value for each attribute in these clusters. In the female segment, we applied the ClusterMembership one time, producing two clusters as pcluster0 and pcluster1; each cluster represents one class value. If we apply the ClusterMembership again, it will partition each cluster to subclusters. pcluster0 will be portioned to three subportions as pCluster_0_0, pCluster_0_1, and pCluster_0_2, and pcluster1 will be portioned to four subpartitions as pCluster_1_0, pCluster_1_1, pCluster_1_2, and pCluster_1_3.

Table 16 Age attribute after discretization

The final results for these three steps (NominalToBinary, Discretization, and ClusterMembership) for the female data segment will be as follows: Instances:127, Attributes: 8: (pCluster_0_0, pCluster_0_1, pCluster_0_2, pCluster_1_0, pCluster_1_1, pCluster_1_2, pCluster_1_3), and Cath. Those attributes will be the input to the classifier Multilayer Perceptron (backpropagation). In the same way, the male data segment is processed.

5 Results analysis and discussion

5.1 Performance metrics

Using tenfold cross-validation technique, the sensitivity, specificity, and accuracy evaluation metrics [9] are used for the performance analysis of male and female diagnosis models, where:

  • Accuracy = (TN + TP)/(TN + TP + FN + FP)

  • Sensitivity = TP/(TP + FN) (the percentage of actual positives, which are correctly identified)

  • Specificity = TN/(TN + FP) (the percentage of negatives which are correctly identified,

Table 17 shows the description of these terms, where TP is true positive, TN true negative, FN false negative, and FP false positive. These terms are obtained from the classification confusion matrix in Table 17, where is positive = summation of the positive column, negative = summation of the negative column, and N = total number of instances.

Table 17 Description of TP, FP, FN, TN

Table 18 shows the calculation of the different performance metrics: for example, for Naïve Bayes classifier for male data segment with the selected 14 attributes by FRV: TP = 119, FN = 11, FP = 13, and TN = 33; then, positive is 130 (the total number of CAD instances in the male data segment); negative is 46 (the total number of normal instances in the male data segment); accuracy is 0.866; error is 0.14; sensitivity is 0.915; and specificity is 0.75.

Table 18 Performance analysis for the best seven male and female diagnosis models

5.2 Performance analysis

The Multilayer Perceptron is the best classifier for the male diagnosis model with the highest performance, where the accuracy = 0.95, the sensitivity = 0.94, specificity = 1, and the error = 0.05. For the female diagnosis model, the accuracy = 0.96, the sensitivity = 0.97, specificity = 0.95, and the error = 0.04. The obtained results for the female data segment were better than those obtained from the male one. To emphasize the role of separating the male and female diagnosis models, using all 55 attributes in developing female and male diagnosis models, Table 19 shows that the best classifier for male diagnosis model was the LMT Tree, where the accuracy = 0.88, the sensitivity = 0.90, specificity = 0.8, and the error = 0.12. For the female diagnosis model, the best classifier was the Bagging Multilayer, where the accuracy = 0.90, the sensitivity = 0.92, specificity = 0.85, and the error = 0.10. As an average performance for male and female classifiers via clustering using all the 55 attributes, the best classifier model for the male diagnosis model was achieved by the Naïve Bayes, where the accuracy = 0.88, the sensitivity = 0.92, specificity = 0.77, and the error = 0.12. Table 20 shows the performance of the male and female diagnosis models using the selected attributes by FRV and classification via clustering using nine different classifiers. For the male diagnosis model, the best classifier was the Multilayer Perceptron, where the accuracy = 0.95, the sensitivity = 0.94, specificity = 1, and the error = 0.05, compared with the accuracy = 0.89, the sensitivity = 0.94, specificity = 0.76, and the error = 0.11. The best classifier for the female diagnosis model was the Bagging Multilayer, where the accuracy = 96%, the sensitivity = 0.97, specificity = 0.95, and the error = 0.04, compared with the normal classifier performance where accuracy = 0.90, sensitivity = 0.92, specificity = 0.85, and the error = 0.10. The performance analysis for the results obtained from the proposed approach using the two different datasets (Heart Disease dataset and Z-Alizadehsani dataset) is shown in Table 21. The results illustrate the acceptable behavior of the proposed approach which discriminates between the female and male datasets to reduce the diagnosis model and enhance its performance. The performance comparison between the classical approach and our proposed approach using total, female, and male datasets is shown in Fig. 3a–f.

Table 19 Performance of classifiers via clustering using 55 attributes
Table 20 Performance of classifiers and the classifiers via clustering using selected attributes
Table 21 Performance analysis for the two different datasets
Fig. 3
figure 3

Performance comparison between the classical approach and the proposed approach

6 Conclusion

The paper proposed an approach for developing diagnosis models for CAD patients, based on their gender. The development of the diagnosis models passes through the following stages: data partitioning, data preprocessing (missing values, binarization, and discretization), features ranking, and features selection algorithms. We built 38 different classifiers to select the most suitable classifier for each data segment. The application of classification via clustering has highly improved the performance of the diagnosis models. The proposed models are found to provide better accuracy data mining models with simple structures than current diagnosis models. The partitioning of the CAD model into separate male and female diagnosis models, each having its characteristics, proved the importance of developing different diagnosis models of patients based on their gender. Using the FRV features selection algorithm was found to be the best among different features selection algorithms. The best classifiers suitable for the two models were based on a Multilayer Perceptron with different functions. The results of the paper were approved by cardiologists to be implemented in the real world. Future investigations will include the development of an interactive system to simplify the diagnosis process for patients and cardiologists. It encourages the implementation of a warning system for the patients according to their status. The proposed approach will be utilized in the governmental hospital to help the surgeons and cardiologists to take the right decisions to save the patients' life.