Abstract
Heart disease is the significant reason of increasing death in worldwide. The early prediction of heart disease is obtained as a challenging role and this prevents severe heart diseases like heart attacks, coronary artery disease, etc. So various traditional methods are utilized for early prediction of heart disease but they are expensive and time consuming. Thus a novel Chaos Game Optimization based Recurrent Neural Network (CGO-RNN) is utilized to overcome the issues and improve early prediction of heart disease accurately. The Kernel Principal Component Analysis (KPCA) approach is used to diminish the computational load and dimensionality reduction of the proposed method as well as the features are extracted to categorize the heart samples for accurate and early prediction. The experimentation results revealed an improved performance by satisfying the results of 98.99%, 98.97%, 98.95%, 98.56%, and 98.54%. This determines that the proposed method enhanced the efficiency and makes a reliable prediction of heart disease.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Healthcare systems are widely used worldwide to predict disease [1]. People all over the world are suffering from various types of deadly diseases [2]. Heart disease, is considered the major serious disease with the greatest death rate throughout the world [3]. Data-driven prediction methods play an important role in the healthcare domain which utilizes them to detect heart diseases in their early stage. [4]. According to data provided by the World Health Organization (WHO) nearly 31% of global deaths of populations are caused by heart-related diseases. Therefore, the prognosis of heart disease is very important in the field of medicine [5]. Furthermore, it is estimated that heart disease-related deaths will rise to 22 million by 2030 if crucial actions aren't taken. People at risk of cardiovascular disease can raise blood pressure, glucose and cholesterol levels, and depression. These parameters are easily sustained at home by primary health facilities [6].
The major contest is to predict the diseases in a time with high accuracy so that the mortality rate can be reduced through effective drugs and other countermeasures. Patients with heart disease usually exhibit different symptoms such as dyspnea, body weakness, and painful swelling of the legs. Early perception is complex but can notably enhance the patient's survival rates [7]. In recent years, some openly attainable heart disease datasets and various predictive models were developed [8, 9]. Several existing methods became developed but those techniques were impossible to accurately predict heart disease [10]. Using heart disease datasets comparative results were carried out to prove the proposed method’s detection performances. The proposed method performs an accurate detection of heart disease by extracting the image features after the pre-processing stage. The aim of the paper is explained as follows.
-
A novel Chaos Game Optimization based Recurrent Neural Network (CGO-RNN) is proposed to perform early diagnosis of heart disease.
-
The KPCA is a non-parametric approach that is employed for dimensionality reduction for extracting the significant features. Also, it diminished the computational load.
-
The CGO algorithm automatically obtains the optimal hyperparameters of the RNN model to enhance the accuracy.
-
For evaluating performance of CGO-based RNN the heart disease prediction and Cleveland Heart Disease Dataset (CHDD) dataset is determined.
The remaining section of the paper is described below. The literary works of various authors are determined in Sect. 2. Section 3 explained the proposed methodology. Section 4 discussed the results of the proposed method and Sect. 5 concludes the article.
2 Related works
Casaña-Eslava et al. [11] concentrate on the function that exists on expanding a recent algorithm for cluster assignment through autonomous hyper-parameter selection. To identify clusters based on density, the Schrödinger equation is adopted to manipulate the focal length parameters that are manipulated in the S-adapted Schrodinger equation. Singh and Bose [12] introduced Fast-Forward Quantum Optimization Algorithm (FFQOA) to process the CT scan taken from COVID-19 patients. The algorithm accomplishes values up from 0.90 to 0.91. Adiabatic quantum computers exist as a hopeful platform for resolving optimization problems. So, Arthur and Date [13] exhibited a quadratic unconstrained binary optimization (QUBO) approach to decode the balanced KMC training. As the quantum handles huge datasets at a high pace, the selected dataset is not computationally intensive.
Khan and Algarni [14] investigated heart disease prediction using Modified Slap Swarm Optimization (MSSO). The Internet of Medical Things (IoMT) Framework is also used for the diagnosis of heart disease which is used by MSSO.
Nagarajan and Thirunavukarasu [15] Suggest a framework of neuro-fuzzy-based healthcare, which includes records and a diagnosis of diseases. An accuracy value of 87.7% is achieved using the Neuro-fuzzy based techniques. To strengthen the decision-making process a fuzzy rule base is designed effectively. Muthu et al. [16] introduce a neuro fuzzy-based algorithm for predicting colorectal diseases in a patient. Technical problems, poor data attributes, and low design of the system are the difficulties.
Thus, Harimoorthy and Thangavelu [17] presented an Improved SVM-Radial based kernel method to identify the huge amount of data to identify the severity level. A vast volume of patient data must be evaluated and processed in the healthcare industry. Big data is primarily used for diagnostic purposes. Singh et al. [18] investigate a Beetle Swarm Optimization (BSO) based ANFIS (BSO-ANFIS) model for cardiovascular disease diagnostics. The accuracy was determined to be 96.08%. However, the processing time to anticipate the condition is lengthy.
Kavitha and Kaulgud [19] utilized the application of quantum paradigms to improve unsupervised machine learning algorithms, specifically the KMC technique. The quantum noise and quiet synchronization times are the issues that need to be addressed in this work. Enireddy et al. [20] Predicted human diseases using optimized clustering techniques. Heart, Liver, and Appendicitis datasets are utilized in this model. F1-Score, accuracy, homogeneity score, and execution time are the evaluation measures used in this model. K-Means Clustering (KMC) has revealed enhanced results compared with the other algorithms. But, the time it takes is very high for the prediction process.
Bhavekar and Goswami [21] illustrated the prediction of heart disease by classifying the images using Recurrent Neural Network and Long Short Term Memory (RNN-LSTM) approach. The developed method was used to perform the classification process and the outcome of the result enhanced the performance. But the prediction was not accurate due to multiple images. Mishra and Mohapatra [22] discussed the validation of heart disorders by Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR). The established methods were determined to perform training and testing classification. The result revealed that the developed method improved the classification accuracy by 96%. However, It could not.
Mohapatra et al. [23] introduced Naïve Bayes Classifier (NB-C) for validating records of heart patients to classify the diseased and normal cases. The distribution-preserving train-test splitting outperformed the developed method. But instruments were cost-effective. Sharwardy et al. [24] presented the frequent validation of heart disease for children by the Markov model. The established method was used to predict the disorder at an early stage that restricts the severity level. The results enhanced the performance but did not validate the accurate diseased case due to limited space. Sahu et al. [25] elaborated heart disorder detection by Machine Learning (ML) algorithm. Various train-test folds were determined for validating the prediction performance and improved the effectiveness. On the other hand, the decision-making process was not effective.
3 Chaos game optimization based region-convolutional neural network (CGO-based RNN) architecture
We proposed CGO-based RNN architecture to enhance the prognosis precision of heart diseases in the healthcare environment. Here, the classification process is done by the RNN with the CGO algorithm to increase the accuracy of the prognosis of cardiovascular disease in the healthcare environment. The CGO algorithm optimizes the RNN network structure by tuning its hyperparameters. Finally, using the dataset the output is classified into heart disease or normal.
3.1 Data preprocessing
Generally, preprocessing adapts the raw data which may improve the processing’s classification capability [26]. The real-world data cannot apply to the diagnosing task directly due to it being inconsistent, incomplete, and noisy. Hence, to represent the heart disease diagnosis data, the preprocessing step is used. Data preprocessing also performs normalization, missing-data filtering, aggregation, and integration. Moreover, the data preprocessing approaches are used to enhance the data quality. [27].
Data cleaning
This is also called the data cleansing approach which is used to remove the anomalies that affect the image. This enhanced the quality of data. It is determined to remove the unwanted noise or data obtained in the image while performing pre-processing.
Data transformation
The data is transformed into a valuable format that is analyzed for making decisions and enhancing the prediction process. It is widely used when any data needs to be converted for attaining the destination. It is easy to perform by both humans as well as computers. The formatting of the data is performed accurately and enhanced the quality of the image which restricts the penetration of incorrect index and incompatible format of the image.
3.2 Dimensionality reduction
The Principal Component Analysis (PCA) is used to predict heart disease but a huge number of features are present that reduced the predicting efficiency. Also, the dimension of the images is maximized that diminished the performance and failed to perform accurate classification and prediction of heart disease. But KPCA reduced the computational load dimensionality as well as multiple features are extracted that predict the heart disorder accurately and at an early stage. The KPCA is used in this paper for feature extraction which results in a completely new feature space obtained via functional mapping, The dimensional reduction process retains the maximum information possible by transforming the high-dimensional features in a low-dimensional space. This step improves the computational cost along with the accuracy of the analysis. The data distributions can be both high-dimensional and sparse or low-dimensional and dense at different subareas [28]. Examine the set of data called \(Y \in S^{m \times e}\) that has \(e\) variables and \(m\) observations and \(Y = [y_{1}^{T} ,.........,y_{m}^{T} ]\) with \(Y_{j} \in {\rm K}^{e}\). The covariance matrix is constructed by the below equation once the data are translated to linear feature space \(G^{e}\).
The mapping function is represented by \(\Delta ( \cdot )\). The kernel PCs (KPCs) is evaluated for solving the issues of eigen value.
here, the eigenvalues, eigenvector, are denoted by \(\alpha ,u\) and the dot product between b and a is represented as \(\left\langle {b,a} \right\rangle\) \(\alpha u = \sum^{G}\) is equivalent to
Each sample coefficient is \(b_{j} (j = 1,2,3.....m)\) in such a way
By integrating Eqs. 3 and 4, the following Eq. 5 is achieved
Equation (5) is simplified by including the kernel matrix C for \((c = 1,2,3,4,......m)\).
here, \(\lambda = [b_{1} ,b_{2} ,........b_{e} ]^{T}\). A subsequent equation for the fresh data is used to calculate the Kernel PCs.
\(T^{2}\) and SPE statistical indicators can be utilized to modify errors in new data once the KPCA model is constructed. The model’s variation is represented in the below equation.
here, \(\Lambda = diag(\alpha_{1} > \alpha_{2} > ......\alpha_{q}\) The SPE statistical index is used to track the KPCA model’s residual subspace, and it is measured as.
here, \(qPCs\) in higher dimensional feature space are denoted by \(\Delta (q(Y)\). A flaw is testified when the SPE fault indicators and \(T^{2}\) values are beyond the limit of the threshold.
3.3 Classification for heart disease using RNN with CGO architecture
In this subsection, we proposed the RNN with CGO architecture to accurately predict heart disease using the heart disease prediction dataset. Finally, our proposed RNN with CGO architecture predicts normal or heart disease.
3.3.1 LSTM–RNN
The LSTM–RNN has huge memory space and it generates a better way to provide an efficient output based on disease diagnosis [30].
Let us assume an input sequence S of length N is given to the RNN architecture with C input neurons, B output neurons, and A hidden neurons [29]. The hidden neurons in the forward pass are described as follows:
where the values of the cth input at time n are indicated as \(s_{c}^{n}\) and the hidden neuron input and activation are represented as \(x_{a}^{n}\) and \(y_{a}^{n}\). The neurons are represented as h. The backward pass is computed as shown below.
3.3.2 Chaos game optimization
The fundamental principle of the CGO algorithm is based on several chaos theory principles, which put fractal self-similarity issues and the chaos game’s arrangement of fractals in context [31]. In dynamic systems, the theory of chaos focuses on certain characteristics. Also, these characteristics are quite susceptible to their early circumstances. Even though there is an inconsistency in these dynamic systems. There are some fundamental patterns present in chaos theory such as repeating shapes, many subsystems, comparable loops, and fractals, making them self-organizing and self-similar dynamic systems. According to chaos theory, a dynamic system’s dependence on its starting conditions means that even small changes in those settings can have a significant impact on future situations. The chaotic game is a mathematical strategy for obtaining fractions that employ a randomly selected principle point in the form of a random and a key polygon. The basic goal is to connect points in an iterative process to create a pattern with a similar shape at different distances. A Sierpinski triangle built as an easy-to-follow fractal is a simple example of the chaos game’s methodology.
3.3.3 Proposed RNN with CGO
Identifying the optimal hyperparameters in an RNN network is a major challenge faced by the existing techniques. The optimal hyperparameter selection improves the efficiency of the classifier. The RNN parameters selected are the time step (P1), batch size (P2), number of hidden layer neurons (P3), learning rate (P4), and number of iterations (P5). The CGO algorithm is initialized with a random set of hyperparameters in different sizes and they are mainly in the form of an N-dimensional vector of size 5. The proposed RNN with CGO is delineated in Fig. 1.
4 Result and discussion
The simulations are carried out using MATLAB and Java programming languages on a Windows 10 Pro 64-bit Core i5 computer with 12 GB of RAM. CGO-based RNN performance is compared with several existing methods including the FFQOA, SVM-Radial based kernel, MSSO, and BSO-ANFIS methods.
4.1 Heart disease prediction dataset
The dataset is gathered from Kaggle [32]. It is determined with 14 attributes of 303 patient records. The features present in the dataset include age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal, and class. The weighting of each trait and how they connect can be determined by a variety of insights and specifically it detects the disease accurately.
4.1.1 Cleveland heart disease dataset (CHDD)
This dataset is determined from UCI-ML repository. It is employed with 76 attributes but only 14 are considered for predicting the disease with 303 instances. But in the latter, it is determined with 16 attributes with 4238 instances [33]. However, 70–30 training as well as a testing validation scheme is determined to validate the performance.
4.2 Comparative analysis
The Accuracy and precision of the CGO-based RNN are represented in Figs. 2 and 3. For accuracy, the CGO-based RNN method achieves higher performance. Our proposed attains 98.99% of accuracy. Similarly, the precision of the proposed method attains a higher precision value of 98.97%.
The Matthews Correlation Coefficient (MCC) comparison is described in Fig. 4. The comparison result depicted that for both metrics CGO-based RNN method achieves higher performance than the baseline methods. The obtained MCC value of the proposed method is 98.54%.
The comparative analysis of different metrics based on the proposed is delineated in Table 1.
4.3 Statistical analysis
The efficiency is validated using a statistical significance test by comparing it with other techniques. If there is no statistical significance between the proposed and existing techniques, then the null hypothesis (NH) will be satisfied. The null hypothesis is defined as NH = 0. The accuracy is computed for a total of 10 times for each model and the paired t-test is applied to it. Table 2 depicts statistical analysis performance of the proposed method.
5 Conclusion
Healthcare monitoring and prediction systems help to save many lives, especially when patients are treated in far-off places and it is the most popular application. A novel approach of Chaos Game Optimization (CGO) is proposed, which is based on the RNN architecture of heart disease diagnosis. The proposed CGO-based RNN technique outperforms the existing FFQOA, SVM-Radial based kernel, MSSO, and BSO–ANFIS methods. When compared to existing baseline work, the proposed method has a relatively short processing time and a very high classification rate. When compared to existing models, the proposed model delivers more accurate predictions. This method enhanced the performance by classifying multiple heart disease images and the identification of severity level is predicted accurately.
Data availability
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
Code availability
Not applicable.
References
Kumar A, Pathak MA (2021) A machine learning model for early prediction of multiple diseases to cure lives. Turkish J Comput Math Educ (TURCOMAT) 12(6):4013–4023
Yuvaraj N, SriPreethaa KR (2019) Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster. Clust Comput 22(Suppl 1):1–9. https://doi.org/10.1007/s10586-017-1532-x
Rahim A, Rasheed Y, Azam F, Anwar MW, Rahim MA, Muzaffar AW (2021) An integrated machine learning framework for effective prediction of cardiovascular diseases. IEEE Access 9:106575–106588
Ripan RC, Sarker IH, Hossain SMM, Anwar MM, Nowrozy R, Hoque MM, Furhad MH (2021) A data-driven heart disease prediction model through K-means clustering-based anomaly detection. SN Computer Science 2:1–12. https://doi.org/10.1007/s42979-021-00518-7
Katarya R, Meena SK (2021) Machine learning techniques for heart disease prediction: a comparative study and analysis. Heal Technol 11:87–97. https://doi.org/10.1007/s12553-020-00505-7
Fitriyani NL, Syafrudin M, Alfian G, Rhee J (2020) HDPM: an effective heart disease prediction model for a clinical decision support system. IEEE Access 8:133034–133050
Li JP, Haq AU, Din SU, Khan J, Khan A, Saboor A (2020) Heart disease identification method using machine learning classification in e-healthcare. IEEE Access 8:107562–107582. https://doi.org/10.1109/ACCESS.2020.3001149
Ran X, Zhou X, Lei M, Tepsan W, Deng W (2021) A novel k-means clustering algorithm with a noise algorithm for capturing urban hotspots. Appl Sci 11(23):11202
Doppala BP, Bhattacharyya D, Chakkravarthy M, Kim TH (2021) A hybrid machine learning approach to identify coronary diseases using feature selection mechanism on heart disease dataset. Distributed and Parallel Databases, 1–20. https://doi.org/10.1007/s10619-021-07329-y.
Thangamani M, Vijayalakshmi R, Ganthimathi M, Ranjitha M, Malarkodi P, Nallusamu S (2020) Efficient classification of heart disease using K-Means clustering Algorithm. Int J Eng Trends Technol 68(12):48–53
Casaña-Eslava RV, Lisboa PJ, Ortega-Martorell S, Jarman IH, Martín-Guerrero JD (2020) Probabilistic quantum clustering. Knowl-Based Syst 194:105567. https://doi.org/10.1016/j.knosys.2020.105567
Singh P, Bose SS (2021) A quantum-clustering optimization method for COVID-19 CT scan image segmentation. Expert Syst Appl 185:115637
Arthur D, Date P (2021) Balanced k-means clustering on an adiabatic quantum computer. Quantum Inf Process 20:1–30. https://doi.org/10.1007/s11128-021-03240-8
Khan MA, Algarni F (2020) A healthcare monitoring system for the diagnosis of heart disease in the IoMT cloud environment using MSSO-ANFIS. IEEE Access 8:122259–122269
Nagarajan R, Thirunavukarasu R (2022) A neuro-fuzzy based healthcare framework for disease analysis and prediction. Multimedia Tools Appl 1(8):11737–11753. https://doi.org/10.1007/s11042-022-12369-2
Muthu B, Sivaparthipan CB, Manogaran G, Sundarasekar R, Kadry S, Shanthini A, Dasel A (2020) IOT based wearable sensor for diseases prediction and symptom analysis in healthcare sector. Peer-to-peer Netw Appl 13:2123–2134
Harimoorthy K, Thangavelu M (2021) Multi-disease prediction model using improved SVM-radial bias technique in healthcare monitoring system. J Ambient Intell Humaniz Comput 12:3715–3723
Singh P, Kaur A, Batth RS, Kaur S, Gianini G (2021) Multi-disease big data analysis using beetle swarm optimization and an adaptive neuro-fuzzy inference system. Neural Comput Appl 33(16):10403–10414
Kavitha SS, Kaulgud N (2022) Quantum K-means clustering method for detecting heart disease using quantum circuit approach. Soft Computing, 1–14.
Enireddy V, Anitha R, Vallinayagam S, Maridurai T, Sathish T, Balakrishnan E (2021) Prediction of human diseases using optimized clustering techniques. Mater Today 46:4258–4264. https://doi.org/10.1016/j.matpr.2021.03.068
Bhavekar GS, Goswami AD (2022) A hybrid model for heart disease prediction using recurrent neural network and long short term memory. Int J Inf Technol 14(4):1781–1789
Mishra I, Mohapatra S (2023) An enhanced approach for analyzing the performance of heart stroke prediction with machine learning techniques. Int J Inform Technol pp 1–14. https://doi.org/10.1007/s41870-023-01321-8.
Mohapatra D, Bhoi SK, Mallick C, Jena KK, Mishra S (2022) Distribution preserving train-test split directed ensemble classifier for heart disease prediction. Int J Inf Technol 14(4):1763–1769
Sharwardy SN, Sarwar H, Rahman MZ (2023) The impact of Markov model to predict the status of children with congenital heart disease at post-operative ICU. Int J Inform Technol pp 1–8.
Sahu A, GM H, Gourisaria MK, Rautaray SS, Pandey M (2021) Cardiovascular risk assessment using data mining inferencing and feature engineering techniques. Int J Inform Technol 13: 2011-2023. https://doi.org/10.1007/s41870-021-00650-w
Masih N, Naz H, Ahuja S (2021) Multilayer perceptron based deep neural network for early detection of coronary heart disease. Heal Technol 11:127–138
Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, Kwak KS (2020) A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Information Fusion 63:208–222. https://doi.org/10.1016/j.inffus.2020.06.008
Kini KR, Bapat M, Madakyaru M (2021) Kantorovich distance based fault detection scheme for non-linear processes. IEEE Access 10:1051–1067
Wang F, Xuan Z, Zhen Z, Li K, Wang T, Shi M (2020) A day-ahead PV power forecasting method based on LSTM-RNN model and time correlation modification under partial daily pattern prediction framework. Energy Convers Manage 212:112766. https://doi.org/10.1016/j.enconman.2020.112766
Wang J, Cao J, Yuan S (2020) Shear wave velocity prediction based on adaptive particle swarm optimization optimized recurrent neural network. J Petrol Sci Eng 194:107466
Ramadan A, Kamel S, Hussein MM, Hassan MH (2021) A new application of chaos game optimization algorithm for parameters extraction of three diode photovoltaic model. IEEE Access 9:51582–51594. https://doi.org/10.1109/ACCESS.2021.3069939
Google. (n.d.). Google. Retrieved January 31, 2023, from https://datasetsearch.research.google.com/search?src=3&query=heart+disease+dataset&docid=L2cvMTFqXzEwZ2poag%3D%3D
Mienye ID, Sun Y, Wang Z (2020) An improved ensemble learning approach for the prediction of heart disease risk. Informatics in Medicine Unlocked 20:100402. https://doi.org/10.1016/j.imu.2020.100402
Acknowledgements
This work is cited as IU/R&D/2023-MCN0002184 in the Integral University manuscript database.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
All authors agreed on the content of the study. AA and MM collected all the data for analysis. AA agreed on the methodology. AA and MM completed the analysis based on agreed steps. Results and conclusions are discussed and written together. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants.
Human and animal rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alam, A., Muqeem, M. An optimal heart disease prediction using chaos game optimization-based recurrent neural model. Int. j. inf. tecnol. 16, 3359–3366 (2024). https://doi.org/10.1007/s41870-023-01597-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-023-01597-w