1 Introduction

Many hospitals and health care facilities have sprung up as a result of increased healthcare awareness and technological advancements. However, providing high-quality health care at a reasonable cost remains a challenge [1]. Chronic disease is one of the most serious public health issues in the world [2], accounting for more than half of all fatalities globally. It also has the greatest non-infectious illness death rate and a high cost of prevention and treatment [3]. Heart disorders also largely afflict those aged 65 and over, and they have surpassed infectious diseases as the leading cause of mortality globally [4]. The considerable rise, complications, and high costs of these illnesses have a negative impact on society and place major financial and physical burdens on the worldwide community. As a result, employing appropriate preventative measures is essential.

Numerous factors are involved in diagnosing heart disease, which complicates a physician's task. To help physicians make quick decisions and minimize errors in diagnosis. Classification systems enable physicians to rapidly examine medical data in considerable detail [1]. These systems are implemented by developing a model that can classify existing records using sample data. Various classification algorithms have been developed and used as classifiers to assist doctors in diagnosing heart disease patients [5].

So, it is vital to identify the root causes responsible for heart diseases. So their remedy could also be planned by using suitable methods. Therefore, it is urgently required to identify factors responsible for heart diseases and develop an effective system for heart disease diagnosis [6]. Traditional methods are ineffective in diagnosing such a disease, and it is necessary to establish a medical diagnostic system based on feature selection approaches to predict and analyze the disease [7].

Feature selection or extraction process is an essential part of pattern recognition and machine learning (ML). The computation cost decreases thanks to the feature selection methods, and the classification performance can also increase. A suitable representation of data from all features is an important problem in machine learning and data mining. Not all original features are always useful for classification or regression tasks. Some features are irrelevant, redundant, or just noise within the distribution of the dataset. This feature can decrease the classification performance. To both increase the classification performance and reduce the computation cost of the classifier, the feature selection process should be used in classification or regression problems [8].

This article proposes an efficient and accurate system to diagnose heart disease, and the system is based on machine learning techniques. The system was developed based on classification algorithms. The feature selection algorithms select the most prominent features to increase the classification accuracy and reduce the execution time of the classification system. Furthermore, cross-validation is a resampling procedure used to evaluate machine learning models and hyper-parameter tuning. The performance measuring metrics are used to assess the performance of the classifiers. The performance of the classifiers has been checked on the selected features as selected by the evaluation metric.

The data set of heart patients available at UCI was used in this study, which included 13 useful features and 270 records. These features and abbreviations are listed in Table 3. Various machine learning algorithms based on genetic algorithms have been used as classifiers for heart prediction. The performance of all algorithms is evaluated on numerous metrics like precision, accuracy, sensitivity, specificity, recall, and F1 score. The experimental results show that the proposed feature selection algorithm (ensemble learning) is feasible with a genetic algorithm for designing a high-level intelligent system to identify heart disease. Among the objectives of this paper are to measure the following:

  • Significantly increased forecast accuracy

  • Achieve a high level of classification reliability

  • Improved accuracy and reduced error compared to single-core models

  • Machine learning models for heart disease prediction demonstrated high performance.

The article has been organized as follows: Sect. 2 explains related work, Sect. 3 proposes the method ultimately, Sect. 4 discusses experimental results with evaluation and performance comparison. The last section presents the paper's conclusion and will be introduced in future works.

2 Related work

Due to advances in various measurement techniques, it is likely to have medical data that contains relevant and irrelevant, and redundant features. Irrelevant features have an adverse effect on the description of the target class. The redundant features do not contribute anything but noise towards the description of the target class. Due to the noise contributed by redundant features, target class identification becomes a non-trivial task. Extracting valuable information from these datasets requires an exhaustive search of the sample space [9].

The heart is responsible for blood circulation throughout the body and acts as the body's engine so that heart disease can be fatal. The World Health Organization considers upset one of the foremost vital causes of death worldwide [10]. According to the surveys, 56 million people died in 2012, and the most important cause of the mortality rate was heart disease, controlled by early detection [11].

A large number of studies are being carried out to find efficient methods of medical diagnosis for various diseases. This study uses classification to predict diagnosis efficiently with fewer factors (i.e., attributes) that contribute more to cardiac disease. Chen et al. developed the breast cancer diagnosis model using the support vector machine (SVM) and a rough set-based feature selection approach [12]. Wang et al. [13] used linear kernel SVM classifiers for heart disease detection and obtained an accuracy of 83.37%. A hybrid neural network method was proposed in [14], and the reported accuracy was 86.8%. In [15], the separability split value, k-nearest neighbor (KNN), and feature space mapping algorithms were used for heart disease detection, and KNN obtained the highest classification accuracy (85.6%).

In [16], a method has been presented to diagnose heart disease using particle swarm optimization and neural network feedforward back-propagation. In [17], a decision tree is used for data mining in heart disease. Researchers attempted to use data mining methods to diagnose heart diseases [18]. Different classification methods, such as neural networks and decision trees, are utilized to predict heart disease and identify its most important factors. The authors tried to diminish the difficulties by using combined methods in this study.

From these effects, it can be seen that feature selection methods can effectively increase the performance of individual classification algorithms in the diagnosis of heart disease. Noisy features and dependency relationships in the heart disease dataset can influence the diagnosis process. Typically, there are numerous records of accompanied syndromes in the original datasets and a large number of redundant symptoms. Consequently, it is necessary to reduce the dimensions of the original feature set by a feature selection method that can remove irrelevant and redundant features.

Over the past few years, many studies have been devoted to evaluating the classification prediction accuracy of the various clustering and classification algorithms applied to heart disease [19], available in the UCI repository. Due to the need to achieve effective analytical techniques for predicting chronic heart disease, many efforts have been made to improve the quality of evidence-based decisions and recommendations in the information environment. One of the most vital functions in health systems is correct medical recommendation supported by predicting the chance of short-run unwellness. It is noteworthy that there is a collection of disease risk prediction models within the medical literature [20].

Researchers have endeavored to search out the most accurate method of ML to explore the relationships in heart disease. Given the need, this article aims to create an intelligent system for predicting and correcting heart disease diagnoses, preventing any unwanted errors, lowering medical costs, and improving treatment quality [21].

Therefore, in the reference [22], the researchers have presented statistical methods for understanding the three medical data sets to produce prediction models by extracting appropriate rules to support the diagnosis process. In this study, methods like decision trees, Naive Bayes (NB), SVM, and a priori algorithms have yielded acceptable results. In reference [23], a fuzzy system supported by an algorithmic genetic program has been tried to predict the chance of heart disease; the proposed fuzzy decision support system (FDSS) method has a high performance in predicting heart disease. In Reference [24], a call network has been accustomed to diagnosing hearts in clinical settings.

The accuracy of the artificial neural network (ANN) approach, classification and regression tree (CART) algorithm, neural network, and logistic regression has reached 97%, 87.6%, 95.6%, and 72%, respectively. In reference [25], the accuracy of an automated method for early detection of class changes in patients with heart failure using classification algorithms on a data set of 297 patients with evaluation validation approaches reached 97.87 and 67% for the two-, three-, and four-class classification problems, respectively. Numerous researchers have used this dataset to investigate various classification problems with different classification algorithms. Detrano [26] used a logistic regression algorithm and obtained 77.0% classification accuracy.

In 1989, Detrano [26] used an LR algorithm and got 77.0% classification precision. In addition, Edmonds [27] worked on the Cleveland dataset to examine world organic process approaches and ascertained some prognostic performance enhancements once they employed a novel method. However, the performance of his suggested method depends on the features extracted by the algorithm.

In 2010, Gudadhe et al. [28] accomplished an associate's degree field base with a multilayer perception network and support vector machine algorithms. Here, the proposed design obtained an associate's degree precision of 80.41% in classifying the two categories (with or without disease). On the other hand, Doppala et al. [29] achieved an accuracy of 85.40%, employing a combined genetic algorithm (GA) with radial basis function (GA-RBF). Also, experiments performed by the other authors show that the Naive Bayes model has the most effective achievement in terms of accurate prediction (86.112%). The challenger NN model with 86.12% correct predictions and, therefore, the third DT with 84% of the score was correct predictions [30].

Gupta et al. [31] used the Massachusetts Hospital Arrhythmia Database. The performance of the proposed method is compared with previous studies based on the sensitivity (SE) and detection rate (D.R). The proposed plan for the Massachusetts Institute of Technology-Beth Israel Hospital Arrhythmia database (MB Ar DB) Real-time database and RT DB is SE with 99.90%, D.R with 99.81%, and SE with 77.99%, D.R with 99.87%, respectively [31].

In 2020, Verma et al. [32] proposed a newly proposed method of hybrid feature selection technique for evaluating the performance of base learners, and we find that the reduced data subset performed is higher than the whole data set. Also, Osubor et al. [33] used an adaptive fuzzy neural inference system to predict postpartum depression. Thirty-six data samples were used in model training. This system had a training error of 7.0706e − 005 in period one and an average test error of 3.0185 [33].

2.1 Feature selection

Selecting the appropriate features to achieve the best result in the data classification problem has been one of the most challenging topics in recent decades. Although using more features increases prediction accuracy from the learning theory perspective, practical evidence indicates this is not always true because not all features are essential for detecting the data class label. Some of them are irrelevant to the data label. Feature selection strategies may be divided into three categories: filtering, wrapper, and embedded [34,35,36].

2.1.1 Filtering methods

Filtering methods measure the accuracy of predictions or classifications based on an indirect criterion, such as the distance criterion, which indicates how well the classes are separated. This method is typically used as a preprocessing step. Instead, the features are selected based on their scores on various statistical tests to relate them to the outcome variable [37,38,39,40,41,42,43,44, 76].

2.1.2 Wrapper methods

Wrapper methods use a search method with a learning model to evaluate a subset of genes in the search phase. Thanks to using a learning model, wrapper methods usually offer better classification performance than filter methods. In contrast, they have several disadvantages, such as high computational overhead and the possibility of overfitting [50, 76].

2.1.3 Embedded methods

These methods perform feature selection in the learning process and are usually assigned to learners. This model also takes advantage of the previous models by using different evaluation criteria in different search stages.

Embedded methods combine filter qualities and wrapper methods. Algorithms do this with internal feature selection methods [40,41,42,43,44,45,46,47]. The various techniques used in this article are described in Table 1.

Table 1 Different techniques of the methods studied

2.1.4 Differences between filter and wrapper methods

The most important differences between wrapper and filtering methods for selecting features are:

  • Filter methods are much faster than wrapper methods because they do not include model training. On the other hand, wrapper methods are also very computationally expensive.

  • Filtering methods use statistical methods to evaluate a subset of features, while wrapper methods use cross-validation.

  • In many cases, filtering methods may not find the best feature subset, but wrapper methods can always provide the best feature subset.

  • In many cases, filtering methods may not find the best feature subset, but wrapper methods can always provide the best feature subset.

  • Using a set of wrapper method features makes the model more susceptible to using a subset of filter method features [37,38,39,40,41,42,43,44].

2.2 Classification models

Machine learning algorithms such as Random forest (RF), Gaussian Naïve Bayes (GNB), Decision Tree (DT), Support-Vector Machines (SVM), gradient boosting (GB), K-nearest neighbors (KNN), and logistic regression (LR) were used for classification of people with DM based on data.

The performance of algorithms was evaluated with precision, recall, sensitivity, specificity, f1 score, and accuracy. To train and evaluate training datasets, seven classification algorithms such as RF, GNB, DT, SVM, GB, KNN, and LR were applied. All programming has been done in Python. 3.7 in the Jupyter Notebook [45,46,47,48].

Logistic regression (LR) is a machine learning technique for regression and classification issues that assign observations to a distinct set of categories. Provision regression is one of the foremost standard classification algorithms for divided (binary) classification [45,46,47,48].

Gaussian Naive classifier (GNB) is a cluster of simple classifiers, which supports possibilities created assumptive the independence of random variables and supported Bayes theorem [45,46,47,48].

Decision tree (DT) is a map of the potential results of a series of connected selections or choices. It permits a private person or organization to weigh potential actions regarding prices, opportunities, and edges [45,46,47,48].

SVM is classed as a pattern recognition algorithmic rule. The SVM algorithmic rule may be used where there is a necessity to spot patterns or classify objects into specific categories [45,46,47,48].

Gradient boosting (GB) classifiers are a bunch of machine learning algorithms that mix several weak learning models to make a robust prophetic model [45,46,47,48].

Random Forest (RF) may be a combined learning methodology for regression classification that works on the coaching time and sophistication output (classification) or for predictions of every tree on an individual basis, supported by a structure consisting of an outsized variety of call trees [45,46,47,48].

K-nearest neighbors (KNN): the K-nearest neighbors (KNN) algorithmic rule is arguably the only machine learning algorithmic rule. Building the model consists solely of storing the coaching information set. To form a replacement data point prediction, the algorithmic rule finds the nearest information points at intervals within the coaching information set—its "nearest neighbors" [45,46,47,48].

Stack generalization (SG) could be a different technique for combining several different classifiers like decision trees, artificial neural networks, support vector machines, etc. (see Fig. 1), which consists of two stages:

  • Basic learners at level zero and stacking model learners at level one;

  • At level zero, various models are used to learn from the dataset, and the output of each model is used to create a new dataset [45,46,47,48].

Fig. 1
figure 1

Ensemble algorithm

As a result, due to the speed of data collection, the issue of feature selection has become one of the most important issues.

2.3 Genetic algorithm

GA is one of the primary population-based stochastic algorithms proposed in history. The most common GA operators are selection, crossover, and mutation [56]. These algorithms encode a possible solution to a selected problem in a straightforward chromosome-like arrangement and apply recombination operators to those structures to preserve critical information. Although GAs are often considered performance optimizers, the wide range of genetic algorithms used in them is quite wide [57]. In Fig. 2, the working principle of a simple genetic algorithm is shown.

Fig. 2
figure 2

The working principle of a simple genetic algorithm

Since single meta-heuristic algorithms are not enough to solve all the problems, this study proposes a hybrid feature selection method for selecting a combination feature (filter, wrapper) based on the Ensemble approach. Accordingly, the aim is to highlight the identification of the algorithms mentioned in Sect. 2.2 and use them in combination with genetic algorithms to choose the most appropriate prediction method.

To that end, introduce a unique conception, effective correlation, and purpose to a quick filter method that will determine suitable options, likewise as redundancy between appropriate options while not performing pairwise correlation analysis. Among the objectives of this paper are to measure the following:

  • Bringing together the best voting machine learning architectures

  • Diagnosis of heart disease using a hybrid machine learning model

  • Significantly increased forecast accuracy

  • Combine several models.

  • Achieve a high level of classification reliability.

  • Greater accuracy and lower error when compared to single-core models.

Several significant limitations in this study could be addressed in future research. First, the study focused on the prediction heart. And things like that.

  • Sample size

  • Lack of available and/or reliable data

  • Lack of access to hospital data

Therefore, the main contribution of this article is as follows:

  • Machine learning models for heart disease prediction demonstrated high performance.

  • Comparison of the proposed method results with the most relevant research conducted according to the previous literature.

  • We investigate the advantages of ensemble learning methods (proposed model) for diagnosis and prediction.

  • Using a hybrid Stacked-genetic approach in the diagnosis of heart disease.

3 Methodology

We now describe the datasets we chose, the algorithms used, and the experimental methodology. They have been used for different research purposes in recent years. Such datasets may contain thousands of instances (records), each represented by hundreds or thousands of features (attributes or variables). The large datasets have many features, including valuable data for understanding information and many inappropriate and related features. This decreases learning achievement and computational performance. So, a preprocessing step named “feature selection” is used to reduce the dimensions before using any information extraction techniques such as classification, related rules, clustering, and regression [49]. A feature selection technique was proposed to pick a subset of relevant and non-redundant features in this study to overcome these problems. To this end, the genetic algorithm and the Relief and FCBF are employed to pick the valuable features in the rapid diagnosis of heart diseases. Our work attempts to predict diagnosis efficiently with a reduced number of features (i.e., attributes) that contribute more towards cardiac disease detection using the feature selection technique. The structure is shown in Fig. 3.

Fig. 3
figure 3

The structure of feature selection

This research uses the feature selection-based ensemble learning approach, a machine learning method, to diagnose heart diseases. The structure of the classifier with machine learning-based genetic algorithms is shown in Fig. 4.

Fig. 4
figure 4

The structures of the classifier with machine learning-based genetic algorithms

Considering that using an algorithm alone to diagnose and predict the disease has not been effective and has not been successful in many scenarios, the ensemble learning algorithm is used to accurately classify heart disease using selected features—the purpose of this.

The task is to improve the accuracy and speed of diagnosis of chronic diseases in the context of the intelligent network by which we want to use ensemble learning approaches and a new meta-learner in stacking learning. The structure of the proposed method is shown in Fig. 5.

Fig. 5
figure 5

The structures of ensemble learning

The proposed method combines statistical analysis methods, machine learning algorithms, and genetic algorithms, which have the advantages of filtering and wrapper methods to select an optimal subset from the total features space. Also, the genetic algorithm along with Relief and FCBF is used to select the valuable features in the rapid diagnosis of heart disease. The order of execution of the work is shown in Fig. 6.

Fig. 6
figure 6

Flow chart of the implemented method

Initially, filtering algorithms are used to rank data set features. After ranking the features by filter algorithms, we used the subscription criterion ( ∩) to select the selected features. These features can input the genetic algorithm to choose valuable features from the rated features. Also, the threshold criterion for ranking the features in these methods was 30%. Next, the genetic algorithm selects the best valuable feature compared to the less critical features (k best feature set by the genetic algorithm) for accurate heart disease prediction using group learning.

3.1 Proposed method

Our goal is to reduce and eliminate low-value features with the filter algorithm and select high-value features with the genetic algorithm for the accurate prediction of heart disease using ensemble learning.

In the category of evolutionary algorithms (EA), which generate solutions to optimization problems using methods inspired by natural evolution, including inheritance, mutation, selection, and crossover, genetic algorithms (GA) stand out for the purpose. It is one of the most effective methods to resolve problems that are little known. GA could be a general algorithm, and it works well in any search space. Within the AI field, a GA may be a search heuristic that mimics the selection method. This heuristic is often used to generate helpful solutions to problems and optimization.

Initially, 13 attributes were concerned with predicting heart disease. Accordingly, in this study, GA is employed for feature selection, an efficient algorithm for solving large-scale problems, and maybe accustomed to finding an optimal feature subset. In GA, individuals are typically represented by n-bit binary vectors. Each of those individuals would represent a feature subset during a feature selection problem. It is supposed that the standard of every candidate solution will be assessed by employing a fitness function. During this study, classification accuracy was taken into account as a fitness function.

We used the FCBF attribute evaluator with the Ranker search and the RelifF attribute evaluator with the Ranker search method in the first step. Next, we used the genetic algorithm to select the valuable features in the rapid diagnosis of heart disease.

Subsequently, seven classifiers, including SVM, NB, Dtree, MLP, KNN, RFC, and LR at level zero and AdaBoost algorithm at level one, are used in ensemble learning to predict the diagnosis of patients with the selected feature as obtained. Ensemble learning has advantages like better generalization capability and less computational time than traditional machine learning algorithms. Finally, we aggregate the results of the various models. Figure 7 illustrates the method of multi-model ensemble feature selection. Figure 7 illustrates the method of multi-models ensemble feature selection.

Fig. 7
figure 7

The flow chart of the proposed method

3.2 GA-stacking

The standard GA algorithm was used. The GA method consists of four significant steps: (1) The features were coded as binary genomes, i.e., “1” means selected and “0” means not selected (The phenotypes labeled as “0” represent the removed features, while phenotypes labeled as "1" represent the chosen features)).

Thus, phenotypes labeled as “0” are called "diminished features," while those labeled as "1" are called "highly significant features." On the idea of phenotypes, a bunch of subsets is formed by each genotype. These subsets are used as training sets for the proposed framework. The population of chromosomes was randomly generated; (2) each chromosome was evaluated using the fitness function, and the best features were selected; (3) the chromosomes were modified using crossover and mutation to create a brand-new generation of the population; (4) the new generation continued to (2) again until stopping criteria were met. A flow chart of the GA method is shown in Fig. 8.

Fig. 8
figure 8

Overview of genetic algorithm routing with Iterative Steps

The population size was set at 50. The ranking was used because of the selection strategy, where individuals ranking within the best 50% fitness were selected to make offspring. A single-point crossover with a proportion of 0.6 and a single-point mutation with a proportion of 0.033 was applied to develop the following population. The elitist strategy selection was set to 2, i.e., the smallest two individuals of the current generation were included within the next population. The number of generations with identical best fitness values was twenty. The number of generations was set at 100. The quantity of GA runs with different initial conditions (some data splits) was set to 100.

3.2.1 Fitness function

In this study, the linear ranking value was accustomed to evaluating the classifier's fitness. The linear ranking could be a statistical agreement between predicted and actual values. The performance of the classifier depends on how the training set is specified. Due to the limited size of the information, a repeated random sub-sampling strategy was used for cross-validation. Before each GA analysis, the information was randomly split ten times into 70% training and 30% validation sets. The performance on each stage was recorded because of the linear ranking value. The median of those ten linear ranking values was then recorded because of the fitness value of the genome.

3.2.2 Crossover (recombination)

In Crossover, two-parent solutions are taken to supply a toddler. Afterward, the choice (reproduction) process is finished, followed by the event of higher individuals. In this paper, the one-point crossover is employed to perform the mating of two chromosomes. Here, two chromosomes are cut once, exchanging the cuts between the two chromosomes.

3.2.3 Mutation

Once the crossover process is finished, the strings are subjected to the mutation process. "Mutation of touch" encompasses flipping a small amount from 0 to 1 and 1 to 0.

3.2.4 Setting the GA parameters

GAs have other parameters. Structural parameters like the population size and execution parameters like mutation and elite rate have to be set.

Nevertheless, we have used the most frequent values for these parameters, yielding good results. Table 2 shows the importance of the GA parameters used in this experiment.

Table 2 Genetic algorithms parameters

4 Experimental design

4.1 Dataset

This study, the data set of heart patients available in (http://archive.ics.uci.edu/ml/datasets/statlog+(heart) has been used, which has 13 useful variables and 270 records. These variables and abbreviations are listed in Table 3. 75% of the data is for training, and 25% of the data is for testing.

Table 3 Features of the heart-type dataset

The dataset at the University of California, Irvine (UCI) was collected by David Aha [12]. The target of the database is to classify the presence or absence of heart illness given the issues of assorted medical tests given to a patient. The database includes 13 attributes indicating whether or not the patient has a heart illness. This dataset is a benchmark dataset because it factors actual patient data and has been extensively used to test many data processing methods [68,69,70,71,72].

4.2 Data preprocessing

Data preprocessing is vital to arrange the heart data so that a machine learning model can accept it. Separating the training and testing data sets ensures that the model learns only from training data and tests its performance with the testing data. The data set used was divided into training and test data. The data set used was divided into training and test data. The datasets had irrelevant, unexplained, null, or repeated features. The most important considerations for cleansing were to assign one category for missing values called "null value" and to form rules that consider data consistency. The input file is processed through various steps to enhance the system's performance. This process can be automated using mathematical modeling and statistical knowledge [45,46,47,48].

4.3 Evaluation of result

For the study, a Jupyter notebook was used for implementation, and the Python programming language was used for coding. Also, the accuracy, specificity, and sensitivity criteria were utilized to compare the classification efficiency.

According to Table 4, FN and FP stand for the number of false-negative and false-negative samples. Additionally, TN and TP represent the number of true negative or true positive samples. Sensitivity measures the degree of positives that are correctly specified against. In addition, Specificity measures the degree of negatives that are accurately discriminated against. AUC is the associate index to live the efficiency of the classifier. In addition, the F1 score was alive of the accuracy of a binary model. To boot, the efficiency was estimated with F-measure (F1) to check the similarity and variety of efficiency [45, 46].

Table 4 Confusion matrix

This article was implemented using the Python programming language version 3.7 in the Anaconda environment of the Jupyter Notebook platform. The implementation details are shown in Table 5.

Table 5 Software requirements

Evaluating the accuracy of multi-class algorithms requires complex methods due to the large number of datasets tested, the variety of techniques used, and the characteristics of the data set, which includes both balanced and unbalanced data. The performance of these algorithms is measured through various parameters such as accuracy, sensitivity, and pressure [45,46,47,48]. Understanding these metrics allows users to understand how well a developed categorization model analyzes textual data. In the field of multi-class problems, traditionally, only the accuracy obtained from the classification is reported as the primary criterion of evaluation and generality [45,46,47,48], which is defined as follows: Reciprocal measurement is also described.

Accuracy: Indicate the quantity of “correct predictions made” by the category, divided by the quantity of “total predictions made” by a similar category

$$ {\text{Accuracy}}\;{ = }\;\frac{{\text{TP + TN}}}{{\text{TP + FP + TN + FN}}} $$
(1)

Sensitivity: Real positive rate: If the result is positive for the person, in a few percent of cases, the model will be positive, which is calculated from the following formula.

$$ {\text{Sensitivity}}\;{ = }\;\frac{{{\text{TP}}}}{{\text{TP + FN}}} $$
(2)

Specificity: If the result is negative for the person, the result will be negative in a few percent of cases. Which is calculated from the following formula [45, 46].

$$ {\text{Specificity}}\; = \,\frac{{{\text{TN}}}}{{{\text{TN}}\;{ + }\;{\text{FP}}}} $$
(3)

5 Results and discussion

Feature extraction is one of the most important methods for reducing dimensions in data preprocessing because databases regularly have redundant and irrelevant attributes that negatively affect the efficiency and complexity of classification algorithms. Feature extraction has two main goals: to decrease the number of attributes and improve the classification efficiency due to its inherent nature [50].

We used the ReliefF with the Ranker search method to attribute features in the first step. We left the method's parameters as the default, which compares every instance with its five nearest neighbors. The highest features with ReliefF were thal, sex, and CP, with 0.0821, 0.0793, and 0.0790. Age, slope, depression, and Ca had significantly lower scores than other features: 0.0188, 0.0157, 0.0118, and 0.0114. We removed these four features and saved the remaining nine into a dataset referred to as the Heart-2 dataset.

Also, we used the FCBF attribute evaluator with the Ranker search method. The most significant features according to FCBF were cp, heart rate, and thal with scores of 0.1728, 0.1701, and 0.1598. This time, the lowest-ranked features were age, slope, depression, and Ca, with 0.0341, 0.0228, 0.0116, and 0.0088. We removed these four features and selected the remaining nine features to create a dataset that we will refer to as the Heart-2 dataset.

Table 6 as represented above, filter strategies perform the feature choice method as a preprocessing step with no induction algorithm. The general characteristics of the coaching information are accustomed to choosing options. The results obtained for the four filters studied (relief, FCBF, and ND) are compared and mentioned. The ultimate aim of this study is to pick out a filter to construct a hybrid technique for feature selection.

Table 6 Result of filter method

The following step was to seek out the simplest feature subset using GA-supported seven different classifiers in the Heart-2 dataset. Clear and intuitive clinical interpretation is required to help clinicians in the higher cognitive processes; thus, GA was chosen for its ability to explore the full-feature space and convey the simplest feature subset. SVM, NB, Dtree, MLP, KNN, RFC, and LR were classifiers for the GA. Each classifier selected 7–9 features using the GA. This feature set is employed in our recognition system and its classification results. Table 7 and Fig. 9 classifiers were adopted to compare the performance of the selection approaches, using Accuracy, sensitivity and specificity as matric.

Table 7 Results of different wrapper method
Fig. 9
figure 9

Results of different wrapper methods

The result confirms the utility of feature choice for classification and, therefore, the superiority of wrapper strategies. According to studies, filter methods for working with large data sets have little computational cost as opposed to wrapping methods due to some problems caused by use, and therefore, filter methods are a reasonable option.

Ultimately, it proposes analyzing and comparing the accuracy of various data processing classification plans using the ensemble learning method to predict a heart condition. SVM, NB, Dtree, MLP, KNN, RFC, LR at level zero and Adaboost at level one were used. The analyses indicate that the Stacked-Genetic algorithm in diagnosing heart disease using the Adaboost technique outperforms the other methods mentioned above. Also, improved performance in terms of performance metrics provides a better understanding of the ensemble models' accuracy, reliability, and usefulness in favor of improved performance for heart disease prediction. The results of the paper are shown in Table 8 and Fig. 10.

Table 8 Comparison of accuracy rates in holdout and cross-validation approach
Table 9 Comparison of our results with those of other studies
Fig. 10
figure 10

Comparison of accuracy rates in Cross-Validation and Holdout approach. Diagnosis of detection heart disease was compared in rates among various algorithms

One of the advantages of the filter, a primary component, is that the calculations are apparent, overfitting is prevented, and suitable for specific datasets. However, this method also has disadvantages, including that the unwanted desired subset may be removed from a subset. In general, the biggest distinction between the filtering and wrapper strategies is that the primary one involves a non-repetitive calculation within the information; the latter will adapt itself to machine learning algorithms and be used. It can be concluded that the wrapper results will be better than the filtering method, but the computational cost of this method is high [51,52,53,54].

Features selection for the classification model attempts to select a minimally sized subset according to the following criteria: (1) The classification accuracy should increase; (2) the values for the selected features should be as close as possible to the original class distribution. Figure 9 shows the eight classification models' accuracy, sensitivity, and specificity when applied to the dataset. As we can see from Fig. 9, using feature selection and extraction to the heart-disease dataset had varying results depending heavily on what classification algorithm it was paired with to construct a model. The model built using the Heart-2 dataset by the proposed algorithm had the highest precision of any model we created.

Increasing advances in computer and electronic technology have provided scientists with the opportunity to collect and study data on various phenomena. Data mining and machine learning are vital in analyzing and constructing data-driven diagnostic models [55]. The performance of different ML algorithms, such as LR, RF, SVM, etc., is compared with the accuracy based on evolutionary algorithms. The accuracy of the model predictions showed that ensemble learning based on a genetic algorithm had the best performance and could lead to 97.57% accuracy, 96% sensitivity, and 97% specific.

Figure 10 shows the accuracy of the eight classification models when applied to the dataset. Many of the models built on the Heart-2 dataset were performed high. However, the classification model built against these features set using the proposed method achieved the highest recall of any model and had the highest accuracy of any model applied. So experimental results procured for the heart-2 dataset show that the proposed method outperformed the other algorithms in terms of all metrics. We believe that our heart prediction framework will assist doctors in predicting hearts with high accuracy.

6 Conclusion and future work

As mentioned, one of the most common chronic diseases and causes of adult death worldwide is heart disease [50]. According to the Health Announcement, Medical Education Department: 33 to 38% of deaths in the country are due to cardiovascular diseases, and Iran has the highest rate of heart death in the world. Changes in people's lifestyles increase the prevalence of heart disease in Iran. Evidence for lifestyle changes shows that the prevalence of the cardiovascular disease is increasing in Iran.

It is calculable that by 2020, the mortality caused by these diseases can increase to 25 million. Artificial intelligence has many applications in medicine, including forecasting the Spread of COVID-19 [73], diabetes [45], cancer [46], heart [74], and Covid-19 [75]. This article used an ensemble learning method based on a genetic algorithm to select valuable features in the immediate diagnosis of heart disease. The results show the high efficiency of the suggested approach in diagnosing heart disease with 97.57% accuracy.

As a suggestion for future research, the Naive Bayesian method, decision tree, or support vector regression can be used as a classifier model, and its combination with the new approach to select a combination feature extracted from this study can be used to diagnose and predict metastatic diseases of breast cancer, lung cancer, Covid-19 and various diseases.