1 Introduction

Heart disease (HD) is a major public health concern that has affected millions of people worldwide. HD is characterized by common symptoms such as shortness of breath, physical weakness, and swollen feet. Researchers are attempting to develop an efficient technique for the detection of heart disease, as current heart disease diagnosis techniques are ineffective in early detection for a variety of reasons, including accuracy and execution time. When modern technology and medical experts are unavailable, diagnosing and treating heart disease is extremely difficult. If the classifiers present in the deep convolution neural networks (DCNN) are of the same type, it is homogenous, and if they are of different types, then it is called as heterogeneous [7, 25]. Every single DCNN is trained with their feature space and in some cases the features may have noise that contain the same data and unwanted data. In such cases, the time taken to train is higher and also the false-positive rate is also higher.

To overcome this problem, feature selection technique is used. The feature selection technique is mainly used in the classification ensemble. While using the feature selection for the DCNN, it provides better results with the optimized features [25]. For obtaining the optimized feature subset, the Swarm and other algorithms are used. To perform the optimizing, the features algorithms such as PSO, genetic algorithms, support vector machine and machine-learning algorithm are used in large numbers. Artificial crow colony algorithm was developed by [6] to overcome the feature selection problems. Since it proposed, the ABC is used in various domains.

The GCSA algorithm provides better results with accuracy when compared with the genetic algorithm and Ant colony optimization algorithms. To improve the efficiency of classification, many algorithms have crown proposed for classification based on patterns, different kinds of algorithms for feature selection and various ensemble classifiers for providing better accuracy rate [11, 13]. The main disadvantage is these algorithms do not provide better results for the different kinds of datasets. This main goal of this work is to provide optimized feature selection with classifier assemblies for better accuracy rate. In this work, the optimized feature subset is obtained using the GCSA. The Cuckoo Search Algorithm is used to select the optimized features and the efficiency is calculated using a DCNN made up of Support vector machine, Naïve Bayes, random forest and decision tree [12]. This paper contains six sections: the Sect. 1 contains an introduction, the next comes related works, Sect. 3 is working on artificial crow colonies, the Sect. 4 is proposed artificial crow colony algorithm with DCNN, Sect. 4 is about the performance evaluation and finally the conclusion part.

2 Related works

Feature selection is done to find the best feature in a subset that can also be called as search in future space. The feature selection is differentiated into two main categories, they are (i) filter method and (ii) wrapper method. In the filter approach, the filtration of features is done before the classifying the features, so they do not depend on the classification algorithm. The weight value is calculated for all the features in the dataset and thus compared with the original dataset [7]. In case wrapper method produces set of feature subset by addition and deletion of feature to get the feature subset, the accuracy is calculated for the feature subset to find the efficiency of the subset. From the various results obtained by the researcher, the wrapper method provides better results when compared with the filter method.

Many algorithms such as Info gain, filtered attribute, Ant colony optimization and bat algorithms are used for the process of feature selection. These are the evolutionary feature selections that are used by many researchers for the process of feature selection; the usage of the swarm optimization has increased in the last few years [20]. The model has crown proposed with rough sets using Ant colony optimization for reducing the dimensionality in medical fields like dermatology. Ant colony algorithm with neural networks for the process of feature selection is also used. The particle swarm optimization algorithm can be used as both filter and wrapper method. A feature selection method is developed based on a wrapper model with bat algorithm with OPF classifier [18, 28].

A new method states that ACO algorithm can also be used for feature selection, and that can also be used for image feature selection. ABC algorithm is an intelligent feature selection algorithm and used to solve the optimization problem with higher accuracy [15]. A model using Ant colony optimization is proposed and discussed the possibilities of using a meta-heuristic algorithm for selecting particular features that helps in obtaining higher accuracy. The concept for subspace clustering was proposed and it is based on a learning mechanism and it showed that each cluster in the dataset is not of the same dimension, so weight has crown assigned to the each cluster in the dataset and it proves that feature selection is also required for clustering [3]. In the medical field, there is a huge volume of heterogeneous data such as medical record, test record, prescription, and scan reports, and that can be used in further stages of treatment [24]. The patterns discovered from various tests provide medical knowledge for the discoveries for example by finding the features, the disease can be identified. Scientists discovered that for using data mining algorithms in healthcare data, pre-processing of data has to be done [6, 10].

The unwanted and repeated data can be removed from the dataset using the pre-processing techniques and this pre-processing is used to map the high-dimensional and low-dimensional data to reduce the time space required. The pre-processing can be done by two methods, (i) selecting the features and (ii) extracting the features, the feature extraction technique reduces the total number of features of the original dataset by combining the features of the dataset into a new subset. The feature selection selects the required features for performing the particular task. The feature selection works better with the medical data, because the originality of the feature remains unchanged so it is easy for a domain expert to select the required features. Wide range feature selection methods are available for finding the required features with accuracy [5, 14].

Fig. 1
figure 1

Feature selection techniques: a filter method, b wrapper method

A model called nine decision trees is developed, and it provides better results when compared to the decision tree and bragging algorithm. The main focus is the algorithm that are used for classification and the measuring the performance of DT algorithm and NB algorithm using the accuracy prediction technique and they proved that both algorithms provides better results after testing and in addition to that they stated that the decision tree is more cost effective than naïve Bayes when the dataset has less attributes and instance [17, 22].

The authors in [19] presented a novel time series-based approach for the early prediction of increase in hypertension by analyzing the electrocardiograph holter signals. The authors in [21] proposed feature selection algorithm for selecting the suitable feature from the available dataset. The authors proposed genetic algorithm-based recursive feature elimination technique and shown a better outcome. The authors in [9] built a health monitoring system based on Internet of Things (IoT) and analyzed the Lamb waves to determine concrete structure of health. Coronato and Cuzzocrea [2] proposed Dynamic Probabilistic Risk Assessment of Medical Information Systems for improving the medical device post market surveillance which is currently implemented as a wait for an incident activity.

The active feature selection model proposed, in this model, the instance of the features are actively selected. The author stated that each feature selection has its own advantages and disadvantages; the performance of the algorithm reduces while using a large dataset [1, 8]. The authors proposed a new method using a decision tree with bagging and backward elimination strategy that is used to find the relationship between the chemometrics and its related paramedical industry. The model was proposed to discuss the requirement of feature selection in both supervised and unsupervised learning [23, 26, 27].

3 Feature selection

The important step in pre-processing is feature selection and it is used in different tasks of data mining algorithms for example pattern classification. If the subset of feature space is high, the feature selection selects only necessary features by removing the unwanted features in data space. Otherwise the unwanted data increases the time and complexity to compute, by adding the unwanted and repeated data in the process. In feature selection, important features are selected according to their importance, so the classification of features can be done easily without changing the original subset. Many researchers proved that classification done by features obtained from feature selection have a higher accuracy rate than classification done without feature selection (Fig. 1).

Many algorithms are developed for selecting the features. Feature selection is the same as pattern classification and they fall under two methods, filter technique and wrapper technique [4, 16]. If the feature selection process is independent, then it is called a filter-based technique and it depends on the characters of the data. If the classifier is used, then it is called a wrapper approach; the features obtained from the wrapper method depend only on the classification algorithm used. Two different classifier algorithms provide two different feature subsets. The feature subset obtained from the wrapper method is more effective when compared to the filter method; the wrapper approach is a time-consuming process.

4 Proposed GCSA with DCNN

The GCSA is combined with the DCNN to provide a better solution to optimization problems. The DCNN is a combination of four algorithms namely DT, SVM, RF and NB algorithms. In the proposed model, GCSA algorithm, the artificial crow colony is to find the features and the subset of feature generator, to evaluate the feature subset obtained DCNN is used. The DCNN is used to find the feature that is suggested by the ABC model. The ABC helps the DCNN in constructing the subset of best features. The ABC algorithm and DCNN enhance the performance of both the algorithms (Fig. 2).

Fig. 2
figure 2

Flowchart process of proposed algorithm

4.1 GCSA-based feature selector

The GCSA is an intelligent algorithm used for the feature optimization process and this colony and genetic algorithm increases the accuracy of the ensemble. The DCNN is combination of four algorithms such as support vector machine, naïve Bayes , decision tree and random forest, and with these algorithms, the ability of each feature in subset can be calculated, Fi in the feature subset. 10-fold technique is used to find the accuracy of available feature in the subset. Each employed crow is represented in binary string 0 or 1. The total number of features is the same as the length of the dataset, and it also represents feature selection done by the crow search. In the binary string, 1 represents the feature is selected and 0 represents feature is not selected. The total amount of onlooker and crow search is the same as the number of features in the dataset (Fig. 3).

Fig. 3
figure 3

Proposed model for heart disease prediction

figure a

After the completion of genetic process for the initial population, then the crow search is applied. The initialization of crows is the extraction of feature from the dataset. Initially, \(C_i\) is the feature based on crows that are arbitrarily located in a search space and given in following equation:

$$\begin{aligned} C_i=C-1, C_2,\ldots C_n,\quad \text {where},\ i=1,2,3\ldots n. \end{aligned}$$
(1)

Few initial solutions are used at the starting stage for the meta-heuristics optimization models for improving by monitoring the contrast solution simultaneously. The opposition point definition can be written as per the following equation:

$$\begin{aligned} \hat{c_i}=g_j+h_j-c_i. \end{aligned}$$
(2)

The fitness function for the OCS can be calculated based on the basis function as per the following equation:

$$\begin{aligned} \text {OC}_i=\text {MAX} (\text {Accuracy}). \end{aligned}$$
(3)

One of the flock crow is randomly chosen to form a novel position and the innovative position of the crow is obtained using the following equation:

$$\begin{aligned} \text {PC}^{i+{\mathrm{iter}}+1}={\left\{ \begin{array}{ll} \text {PC}_{i+{\mathrm{iter}}+1}+a_i\times cl^{i, {\mathrm{iter}}} \\ \quad \times \left( M^{j,{\mathrm{iter}}}-\text {PC}^{i,{\mathrm{iter}}}\right) &{}\text {if }a_j\ge \text {APC}^{j,{\mathrm{iter}}}\\ \text {ran PC}&{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(4)

The current position and memory of the upgraded crow is processed based on the following equation:

$$\begin{aligned} M^{i,{\mathrm{iter}}+1}={\left\{ \begin{array}{ll} \text {PC}^{i,{\mathrm{iter}}}&{} c(\text {PC}^{j.{\mathrm{iter}}+1})>c(M^{j,{\mathrm{iter}}})\\ M^{i,{\mathrm{iter}}} &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(5)

The fitness value of the crow’s new position is controlled to be the highest in the sustained position. Crow often updates storage space with new locations. If multiple iterations are implemented, the best memory location corresponding to the target will be addressed as the best filtering feature set solution. The suitability of the role, the health assessment is done through the function of each crow:

$$\begin{aligned} \text {fitness}_1(S)= & {} \frac{\sum _{j=1}^{m}\text {accuracy}(S)}{m} \end{aligned}$$
(6)
$$\begin{aligned} \text {fitness}_2(S)= & {} \text {consensus}(S) \end{aligned}$$
(7)
$$\begin{aligned} \text {fitness}= & {} \frac{\text {fitness}_1(S)+\text {fitness}_2(S)}{2}. \end{aligned}$$
(8)

The accuracy(S) is the predicted value in the ensemble classifier, consensus(S) represents the accuracy of classification on the S feature subset. The fitness can be evaluated using mean accuracy and consensus, mean accuracy checks whether the features have the power for accurate classification, the DCNN tries for feature optimization. The mean accuracy helps in increasing the ability of feature subset generalization. The second phase of fitness evaluation is consensus; it finds whether the feature subset has optimality in classification for producing high consensus classification. The crow search passes the information to the onlooker crow and it checks the likelihood of feature selection using Eq. (7), the new solution given by onlooker crow is represented as \(V_i\), using the mean and consensus value of the feature the crow search points out the feature selected by the onlooker crow. If the newly obtained value \(V_i\) is larger than \(X_i\), then the crow search points to the feature in the feature subset that has crown previously selected and the new one. If the value of \(V_i\) is less compared to \(X_i\), then crow search features will be used for further process, and the newly selected feature is omitted. The \(V_i\) is obtained using the following formula:

$$\begin{aligned} P_i= & {} \frac{{\mathrm{fitness}}}{\sum _{i=1}^{m}{\mathrm{fitness}}} \end{aligned}$$
(9)
$$\begin{aligned} V_i= & {} X_i+\mu _i\left( X_i-X_j\right) \end{aligned}$$
(10)

where \(X_i\) is the selected feature’s accuracy, \(X_j\) is the accuracy of feature selected by onlooker crow. \(\mu _i\) is the randomly generated number in range (0, 1).

Therefore, when the crow search is allocated with a new feature, the onlooker crows make full use of it and a new configuration for the subset is produced, after this process, all the features are used for forming a new feature subset, the available content in features tries to move to better feature subset configuration. If the improvement is not made in crow search then the employee becomes scout crow. Then, the new feature subset is assigned to the scout crow that is represented as follows:

$$\begin{aligned} X_{ij}=X_j^{{\mathrm{max}}}+\text {rand}(0,1)(X_j^{{\mathrm{max}}}-X_j^{{\mathrm{min}}}) \end{aligned}$$
(11)

where \(X_j^{{\mathrm{max}}}\) is the upper boundary value and \(X_j^{{\mathrm{min}}}\) is the lower boundary value.

The upper and lower boundaries and the same process have crown carried out till the stopping criteria are achieved to get the best features.

The obtained optimal solution is applied as input for the genetic algorithm and evaluated for the fitness of each feature and based on the chromosome, the child is selected as features and mutation process for that child-chromosome is selected and repeated until the given iteration value. From this, the GCSA can select the features based on their ranking, so the important feature is selected from the feature subset; the time consumption caused by the noisy and unwanted features is reduced. In case of large datasets, the classifier performance is reduced because of the huge number of features handled. In the GCSA algorithm, the features are selected based on their importance and classifier computation speed is enhanced.

Table 1 Attributes of the dataset
Table 2 Attributes of the dataset
Table 3 Features selected and accuracy of GCSA

4.2 DCNN

In this paper, the classifying algorithms such as DT, SVM, RF and NB are combined to form an ensemble classifier. In this work, crows search for the features, the features selected by each crow becomes the input to the classifier. The evaluation of features is done one by one and each feature has to be evaluated separately. The training of the subset is done using the four classifier algorithm and classifies the test subset. After classification is done, the cuckoo search algorithm calculates the mean accuracy and consensus of ensemble using formula. The average of mean accuracy and consensus is used to calculate the fitness of the features. For the process of selecting the best features, the fitness function is used.

5 Results and discussion

Dataset used implementation of proposed work and result obtained compared with other classifier algorithms are discussed in this phase.

5.1 Dataset

The performance of the proposed GCSA with DCNN is implemented and tested using ten different medical datasets. The ten different datasets are dermatology dataset, heart-C dataset, lung cancer dataset, pima Indian dataset, hepatitis disease dataset, Iris disease dataset, Wisconsin cancer dataset, Lymphography dataset, diabetes disease dataset and Statlog heart disease dataset and these kinds of datasets are used. The attributes of the dataset are given in figure. These datasets are used for DCNN and feature selection by many researchers so we have used these datasets for performance evaluation of proposed algorithms. This dataset is used because it has many attributes and instances so the accuracy of the proposed algorithm can be found easily (Table 1).

Fig. 4
figure 4

Ranking features by importance [24]

Fig. 5
figure 5

Feature selection using GCSA

Fig. 6
figure 6

Accuracy of proposed GCSA DCNN [22]

Fig. 7
figure 7

Comparison of various algorithms

Table 4 Features selected and accuracy of GCSAE
Fig. 8
figure 8

Comparative analysis of different agorithms based on accuracy

The performance of the proposed GCSA DCNN is implemented and tested using ten different medical datasets. The ten different datasets are dermatology dataset, heart-C dataset, lung cancer dataset, Pima Indian dataset, hepatitis dataset, Iris dataset, Wisconsin cancer dataset, lympho dataset, diabetes disease dataset and Statlog disease dataset being used (Table 2).

The parameters for selecting the best feature subset is set, the best feature subset is obtained after the predetermined number of cycles. The employed crow passes the selected features to the DCNN after every iteration. The mean accuracy and consensus of ensemble classifiers are calculated using formula (4) and (5), the fitness of the feature subset is average of accuracy and consensus. The onlookers select the features with a probability based on fitness. The number of selected features and accuracy of classification is given in Table 3 (Figs. 4, 5, 6).

Automatic feature selection models can be used to select the features. The caret package provides an automatic feature selection model called Recursive Feature Elimination. The RFE model is used in the Pima Indians diabetes dataset. Random forest is used for the iteration to evaluate the model. Figure 7 explains the graph plotted using ranking features by their importance, which gives a good resemblance of the feature selection.

The four DCNN is developed using the optimized feature subset obtained by the proposed GCSA algorithm. The classification accuracy of four classifiers is shown in Table 4. The performance of GCSA is compared with Ant colony optimization, C4.5 bagging and C4.5 boosting. A 10-fold cross-validation algorithm is used to evaluate the accuracy of constructed classifiers. Classification accuracy rate of ten dataset increases rapidly after using the proposed artificial crow DCNN algorithm.

The performance of proposed GCSA with DCNN is compared with many other algorithms such as Decision tree, Support vector Machine, genetic algorithm, gain ratio, one attribute based, and filter attribute based, for the comparison, Statlog heart disease dataset is used. Graph is drawn to show good performance of the proposed GCSA–DCNN. The number of features selected by the proposed artificial crow colony algorithm is 8 features. The number of features selected by the different feature selection algorithm is given in the table. The GCSA with DCNN algorithm performs better than all the bagging and boosting performed earlier (Fig. 8).

6 Conclusion

In this research, the proposed optimization function based on DCNN is introduced for enhancing the performance of ABC technique. ABC algorithm with DCNN-based feature selection algorithm identifies eight features 8. The resultant features are passed to the DCNN for finding the accuracy. The GCSA model obtained an accuracy of 88.78% for all original features and 95.34% for extracted features. The accuracy for the proposed algorithm is high when compared to other feature selection such as Decision tree, support vector machine, and artificial crow colony.