RETRACTED ARTICLE: Enhanced decision support system to predict and prevent hypertension using computational intelligence techniques

Ambika, M.; Raghuraman, G.; SaiRamesh, L.

doi:10.1007/s00500-020-04743-9

RETRACTED ARTICLE: Enhanced decision support system to predict and prevent hypertension using computational intelligence techniques

Methodologies and Application
Published: 14 February 2020

Volume 24, pages 13293–13304, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

RETRACTED ARTICLE: Enhanced decision support system to predict and prevent hypertension using computational intelligence techniques

Download PDF

M. Ambika¹,
G. Raghuraman¹ &
L. SaiRamesh²

774 Accesses
11 Citations
Explore all metrics

This article was retracted on 19 August 2024

This article has been updated

Abstract

Medical decision support systems have been a core of intense research for years. The ongoing study shows that artificial intelligence has been accustomed to probe risk factors for hypertension. Factors, like health-damaging personal behaviors and changes in lifestyle and environment, are major contributors to chronic diseases. The goal of this research was to forecast the risk of developing hypertension by revealing hidden patterns in medical datasets. Quality of the data is the key to enhance the performance of learning model. But most healthcare data suffer from class imbalance problem, which induce the need for an intelligent model which can learn from such grimy data. This paper incorporates a novel approach by combing learning model and rule-based mining to offer decision support. Typically, the proposed work comprises two main implications. First suggests an intelligent learning model using boosting-based support vector machine to diagnose and expose multi-class categories in the imbalanced datasets. Finally, the enhanced predictive model is built upon the classification solution which will portray the innate data similarities. An intelligent fuzzy-based approach was employed to recognize frequent behavioral patterns. Based on these rules, valid decisions could be made to prevent hypertension. The suggested enhanced model is evaluated using a real-time hypertension dataset obtained through primary health centers. With the combination of ensemble strategies, the proposed intelligent learning model attains high classification accuracy for the imbalanced dataset above the traditional model. Thus, the efficient integration of personalized behavior with health data could provide a better understanding regarding patient health. In future this can serve as an eye toward personalized medicine.

A Hybrid Feature Selection Method to Classification and Its Application in Hypertension Diagnosis

Hypertension Type Classification Using Hierarchical Ensemble of One-Class Classifiers for Imbalanced Data

Ensemble method based predictive model for analyzing disease datasets: a predictive analysis approach

Article 05 February 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

As medicine forward, there comprises a need for sophisticated decision support systems to induce real-time predictions. Recent advancement in artificial intelligence (AI) exhibits an effective pertaining in the area of healthcare (Moreira et al. 2019; Pereboom et al. 2019; Krittanawong et al. 2018). It can formalize very intricate decisions and predictive analysis which leads to personalized medicine. Machine learning, a prominent style of AI, solves complicated problems by identifying patterns to make appropriate forecasts (Krittanawong et al. 2017; Bzdok et al. 2018). Such models offer support for decision making in several areas of health care like prediction, diagnosis and hospital management (Horsky et al. 2017).

Hypertension is a major chronic disorder caused due to raised level of blood pressure. It serves the leading risk factor for the global burdens of death and disability (National Institute of Medical Statistics, Indian Council of Medical Research (ICMR) 2009). Typically, the growing pattern of prevalence of hypertension surged from 118.2 million in 2000–213.5 million by 2025 and remains one of the world’s greatest public health problems (Ministry of Home Affairs 2010). Uncontrolled hypertension can lead the peril of health issues like coronary heart disease, renal failure and stroke (Benhar et al. 2019). Health-damaging personal behaviors accrue over time can contribute to the high burden of this chronic disease (Poulter et al. 2015). Hence, prevention and avoidance serve as a vital public health and economics issues in the twenty-first century.

1.1 Motivation

Major clinical guidelines and landmark observation studies like the Framingham Hypertension Study (FHS) discern the common factor or characteristics that contribute to the burden of the disorder (Parikh et al. 2008). These chronic ailments are prone to enhance due to changes in the demographic, lifestyle and the environment, and health-damaging behaviors of the persons (Poulter et al. 2015). Probably the most related investigate in hypertension has recently been focusing on the prediction by analyzing clinical trials. The factors accountable for badly controlled hypertension not merely include clinical trials but need to integrate genetics, behavioral and environmental factors. This model can improve detection and prevention of hypertension.

The raw clinical facts obtained through the laboratory and clinical process experience poor quality due to instrument or human error. The most medical dataset possesses imbalance class due to frequent occurrence of normal cases and rare occurrence of abnormal cases (Haixiang et al. 2017; He and Garcia 2009). To distribute the class properly, techniques like oversampling or undersampling could be applied. Still these methodologies involve some downsides like overfitting of minority classes in case of oversampling and discarding of useful information in case of undersampling. These effects have a great impact on the medical dataset. Hence, there is a need for an alternative approach, which can classify the imbalanced dataset. Furthermore, the problem of hypertensive type diagnosis must classify different classes.

Correspondingly, these consequences impose a need for intelligent learning model which can learn from imbalanced multi-class data to make accurate real-time prediction. The prime objective of the article would be to promote decision making through the diagnosis and predict hypertension to evoke the quality of treatment. This research proposes an enhanced model to predict the likelihood of acquiring hypertension by revealing the hidden patterns in the medical dataset. In this regard, this paper puts forward a unique approach to reinforce the decision support system by combining supervised learning and association rule mining. As the information quality influences the learning model’s performance, appropriate combination of pre-processing strategies has been applied (Benhar et al. 2019; Alexandropoulos et al. 2019).

This article proposes an intelligent learning model using ensemble methodologies to handle the imbalanced dataset. This approach constructs several classifiers and aggregates them to form a strong classifier in order to improve the performance. Most study evidences that support vector machine (SVM) renders high level of accuracy when equated to other techniques (Chen and Xu 2015). The suggested intelligent learning model uses boosting-based SVM to reveal various stages of hypertension. The outcome of this diagnosis model is employed to build prediction model. The premise of fuzzy sets has been greeted as an appropriate tool to model various forms of patterns (Delgado et al. 2003). This paper projected a combined approach of fuzzy-based association rule mining to discover frequent behavioral styles within the medical data. Based on the rules, needed decision can be made to have impact on the BP control. Thus, promote efficient integration of behavioral data with health data to offer better understanding. The enhance model proposed shows eminent performance compared to the traditional methods.

The rest of the paper is organized as follows. Section 2 briefly discusses the background of the study and projected the research gap. Section 3 elaborates the flow of the proposed work and its phases. Section 4 presents the implementation details and evaluation metrics, and Sect. 5 discusses about the evaluation results. Finally, Sect. 6 illustrates the conclusion and gave suggestion for the future direction.

2 Background of the study

2.1 Hypertension and its risk factors

India has been facing a swift transition in health care with rising prevalence of Non-Communicable Diseases (NCDs) for the past 15 years. As indicated by the World Health Organization, NCDs account for 63% of the overall demise rate (Lubet 2014). According to the statistical report for the reasons of fatality in India (2010–2013), non-communicable diseases continue to upsurge in proportion as 49.2% through the year 2010–13, 45.4% in 2004–06 and 42.4% in 2001–03 (Ministry of Health and Family Welfare Government of India 2017; India State-level Disease Burden Initiative Collaborators 2017). Figure 1 portraits the comparative analysis of the death rate in India. Such disorders not merely have a significant impact on individual health, but additionally affect the economical growth (Prakash Upadhyay 2012). Thus, India endures to lose 4.58 trillion dollars before 2030 due to NCDs. High blood pressure is notable among them.

Hypertension a raised level of blood pressure serves a significant risk for cardiovascular disease (CVD), has considerably increased in last two decades. Health-damaging personal behavior is the prime driving force for these chronic diseases. Figure 2 shows various compositions of the chronic disease. The genetic predisposition and social circumstances where people reside and work also play a crucial role. Table 1 shows the leading risk factors for the major NCDs. Risk factors comprise behavior like tobacco utilization, smoking, alcohol consumption, unhealthy diet, obesity, lack of physical activities, stress and environmental factors (National Institute of Medical Statistics, Indian Council of Medical Research (ICMR) 2009). These determinants are usually uncertain and controllable based upon the lifestyle modification (WHO 2004; Non communicable Diseases Progress Monitor 2017; WG3 2017). The basic inspiration of this research is to study individuals with hypertension to improve the adherence. Thus, promote efficient integration of behavioral data with health data to offer better understanding. This article acquaints a way to predict and diagnose hypertension by employing computational intelligence techniques to strengthen the standard of treatment.

Table 1 Risk factors common to major NCDs

Full size table

2.2 Artificial intelligence in hypertension

The component of computer science which is adequately pertained in medical science is artificial intelligence. Machine learning a style of AI presents assorted strategies for resolving complicated problems by identifying interaction patterns (Bzdok et al. 2018). Medical decision support technologies have manifested the ability to improvise patient care across diverse healthcare settings. Numerous studies suggested that clinical decision support interventions are effective in assisting physician and health professionals, with respect to process outcomes such as guideline adherence (National Institute of Medical Statistics, Indian Council of Medical Research (ICMR) 2009). This section exhibits some prevailing works of artificial intelligence associated with hypertension.

Moreira et al. (Horsky et al. 2017) offer a comparative analysis of the most meaningful solutions for intelligent systems in health care. This investigation is imperative for understanding the diverse approaches used in recent studies to support and legitimize the best innovation for the advancement of further research on the subject. Krawezyk and Wozniak (2015) examined a hierarchical one-class ensemble classifier to automatically diagnose the presence of hypertension and hence proposed a medical decision support system for highly imbalanced multi-class classification. Goli et al. (2019) elaborated the state of the art in the applications of data mining in traditional medicine. The study reveals that machine learning techniques like Bayesian networks, support vector machines and artificial neural networks were frequently used in the field of traditional medicine for syndrome differentiation.

LaFreniere et al. (2016) proposed a prediction model for hypertension using neural network by considering factors like medical records and demographic information. This model lacks the usage of behavioural information. George et al. (2019) proposed a case study on the application of artificial intelligence to cardiovascular disease diagnosis and prognosis. Further, this article elaborates the technical and methodological issues related to the implementation of the various techniques. Yan et al. (2019) proposed a multi-period hybrid decision support model for medical diagnosis by combing similarity measurement and three-way decision theory under fuzzy environment. But this approach is not applicable for many cases, as unable to deal with incomplete data and attribute selection.

Chatterjee and Das (2019) projected a methodology for the detection and classification of brain tumor using integrated type-II fuzzy logic and ANFIS. Using an ensemble approach, a novel classifying technique has been developed to classify the detected tumor incorporating the extracted features. Guzman et al. (2017) proposed a neuro-fuzzy hybrid model (NFHM) to categorize blood pressure using fuzzy system. The rules provided are optimized by a genetic algorithm to get the most ideal number of principles for the classifier with the least classification error. Vladimir et al. (2017) studied the effectiveness of various machine learning techniques for investing arterial hypertension by means of the short-term HRV and combining statistical, spectral and nonlinear features.

This model lacks from the usage of behavioral information. Moreira et al. (2019) applied a novel adaptive classification system for the diagnosis of chronic diseases. The proposed model employs a combined approach of PCA and relief method with optimized support vector machine classifier. Krittanawong et al. (2018) developed a ranking method based on shape similarity using fuzzy number to group decision-making problems. Das et al. (2013) build a fuzzy expert system to analyze the threats of hypertension by integrating various factors like set of symptoms and rules. Srivastava et al. (2013) illustrate various soft computing approaches for classification of hypertension. Guzman et al. (2015) designed fuzzy system for diagnosis of hypertension using systolic and diastolic blood pressure as input parameters. Using a set of decision rules, different suggestions are provided for diagnosis.

2.3 Research gap and objectives

Advancement in artificial intelligence shows an immense impact in the field of medical science. A wide range of research studies is underway in medical diagnosis and prediction using machine learning. But only very limited works are available in the analysis of hypertension and its disorders due to the nature and data quality. Awareness, prevention and early diagnosis are essential to lessen the pervasiveness of hypertension. These chronic disorders are prone to enhance due to transitions in the health-damaging behaviors of the individuals. Hence, behavioral and lifestyle improvements may provide a way for prevention. As prevention is the simplest way to evade the incidence of hypertension, induce the need to analyze the personal behavioral patterns from the data. These patterns may be used for possible future forecast.

Most of the review works related to hypertension try to forecast by considering clinical examinations alone and neglects to capture behavioral factors. Furthermore, hypertensive type diagnosis is the multi-class classification problem; hence, it imposes a need to consider factors for multiple categories. The clinical data suffer from poor quality owed to the improper distribution of classes. Most machine learning model works well while using balanced dataset. But in the event with imbalanced distribution, the majority classifier degrades in overall performance. Though sampling techniques can conquer this dispersion, still it can lead to loss of valuable information. These issues impose a necessity for a model to investigate and analyze several parameters, to predict the occurrence and to make earlier decision. Thus, promoting efficient integration of behavioral data with health data offers better understanding for prediction.

3 Proposed methodology

The prime objective of this article is to promote decision making through the diagnosis of hypertension to evoke the quality of treatment. This research proposes an enhanced model to forecast the probability of acquiring hypertension by enlightening the hidden patterns in the medical dataset. Awareness and preventive decision are the substantial way to reduce the event of hypertension. Most studies portrait that lifestyle modification is the most effective aspect in case of preventive measures. Hence, this proposed model will explore on various parameters to perceive the individual at the peril of developing various stages of hypertension. Moreover, an intelligent and accurate diagnostic system is mandatory to foresee the future.

3.1 The proposed enhanced decision support model

The proposed work fuses a consolidated methodology of learning model and rule-based mining. The raw medical information required for analysis is gathered from health centers through various investigations. Such information is converted into the digital version for easy processing. Generally, the raw medical data will be grimy and consequently need preprocessing to wipe out missing values and outliers. The proposed learning model employs these processed data for analysis. This article proposes an intelligent learning model using boosting -based SVM to uncover diverse stages of hypertension. The outcome of this diagnosis model is employed to build prediction model. For every category of the disease, the pattern of behavioral factors is revealed using fuzzy-based association rule mining. Based on the rules, efficient decision can be taken for prevention. Figure 3 demonstrates the proposed enhanced model. The main processes of the model include four phrases: data collection phase, data preparing phase, learning model phase and data exploration phase.

3.2 Data collection phase

The raw dataset was obtained from the primary health centers under the guidance of the specialist and skilled experts. The information is collected through various forms like discussion done with patients, clinical reports and doctor analysis. The dataset embraces the following details:

Patients’ demographic information which includes personal details
Personal behavioral information including eating habits and addiction behaviors
Patients’ brief medical history including their past history and family history of disorders
Anthropometric measurement including height, weight and related details
Blood pressure measurement including systolic and diastolic blood pressure readings
Laboratory investigation including various laboratory test reports
Doctor analysis about the presence of hypertension and its type

The clinical records are converted into Electronic Health Record (EHR) also known as Electronic Medical Record (EMR) which is an electronic version of the patient medical record. It provides a remarkable opportunity to pertain informatics techniques in healthcare (Hayrinen et al. 2008).

3.3 Data preparation phase

Data quality and data preparation are key for the high performance of machine learning. Initially, the normalization procedure is imposed to the data. Generally, the raw clinical facts obtained through various laboratory procedures are of low quality. This might be a result of instrument error while estimating or manual error while enrolling (Ting et al. 2009; Almuhaideb and Menai 2016). Such grimy data degrade the performance of the entire learning model. Hence, the data should be improvised in quality before commencing a learning model to extricate valuable information. Appropriate preprocessing techniques are applied to eradicate the missing values and outliers. Normally, missing values occur due to incomplete records in the dataset.

The proposed work wipes out the missing values by imposing the mean substitution method (Grzymala-Busse and Hu 2001; Kotsiantis et al. 2006). Based on the available cases, the attributes with missing values are substituted by calculating mean and mode values for the numerical and nominal categories, respectively. The occurrence of errors or outliers in the data is reckoned as noise. Outliers are the data values which are extremely deviated from other values in the dataset. Such noisy data should be addressed to enhance the performance of the model. This work employs one of the statistical methods such as interquartile range (IQR) technique for observing the incidence of outliers (Jassim 2013). The interquartile range is calculated by finding the deviation of the first quartile and the third quartile (Gamberger et al. 2000). This range depicts data distribution about the median. Thus, any data values which lie beyond 1.5 IQR below the first quartile and 1.5 IQR above the third quartile are considered as outliers and are eradicated.

3.4 Design and development of learning phase

The problem of hypertensive type diagnosis is the crucial stage as it entails different classes. According to the literature reviews, most related works deal with the binary-class classification problem. The condition is either the patient has the disease or not. Consequently, there is a need for a model which can learn from such data using multi-class learning techniques, to make accurate real-time predictions (Tomar and Agarwal 2015).

Moreover, most healthcare problems suffer due to imbalanced dataset. This is due to frequent occurrence of normal cases and rare occurrence of abnormal cases (Haixiang et al. 2017; He and Garcia 2009). Resampling approaches can be applied to the data still there is a likelihood of overfitting and underfitting the minor and major cases, respectively. Further, the number of attributes and their dependency rely an impact on the accuracy of the learning models. This can be dealt by preferring appropriate features from the basic feature set.

These issues impose a need for intelligent learning model which can learn from imbalanced multi-class data to make accurate real-time prediction. The fundamental motivation of this paper is to assemble a learning model by adopting supervised learning techniques. There has been extensive investigation on the diligence of supervised machine learning algorithms for medical data (Tomar and Agarwal 2015). Most research evidences that support vector machine (SVM) furnishes a high level of accuracy when equated to other techniques (Chen and Xu 2015).

3.4.1 Support vector machine (SVM)

A set of training instances are chosen as support vectors to determine the decision boundary hyperplane of the classifier. SVM uses a kernel to project training data from an input space to a higher-dimensional feature space and finds an optimal separating hyperplane within the feature space and uses a regularization parameter, C, to control its model complexity and training error. But still it is inferred through the distribution of outliers due to imbalance class and increases the order of growth. This situation demands an intelligent classifier that overcomes imbalance distribution and at the same time with reduced order of growth.

3.4.2 Proposed intelligent learning model

This paper put ahead of an intelligent learning model by improving SVM with boosting mechanism. AdaBoost (Adaptive Boosting) an iterative algorithm is the most popular boosting mechanism (Jain and Singh 2019). The incredible accomplishment of AdaBoost can be ascribed to its ability to maximize the margin on a training set, which results in enhancing the performance of the classifier. This model initially generates decision boundary hyperplane which separates the feature vectors by solving linear equation using SVM. For this initial classifier, classification error is calculated. Weights are adjusted to the misclassified features, to minimize the error rate.

The boosting mechanism is applied to create a sequentially weighted set of weak classifiers in order to generate new classifiers (linear discriminator) which are more operational on the training data. Hence, the AdaBoost algorithm multiple classifiers iteratively endorse the classification accuracies of many different data sets compared to the given best individual classifier. Finally cascading operation ranks the efficient discriminator on top ladder so that the feature vector gets classified correctly. The algorithm for proposed intelligent learning model consists of two phases. Table 2 elaborates the stepwise process.

Table 2 Algorithm for the proposed intelligent learning model

Full size table

Training Phase—train the classifier with training dataset by constructing strong linear classifier.
Testing Phase—assign class of the testing dataset by using classification error.

3.5 Design and development of prediction phase

The enhanced decision support system proposed in this article builds a prediction model over the diagnosis phase to predict the menace of accruing hypertension. Health-damaging personal behaviors can contribute to the high burden of this chronic disease. These determinants are modifiable and controllable. Instead of early detection and diagnosis, awareness and prevention confer a way to reduce the inference of hypertension. This phase explores various personal behavior parameters and reveals the hidden patterns to identify the individuals at the threat of raising various levels of hypertension. Based upon these patterns, efficient decision can be taken for prevention.

3.5.1 Association rule mining

Association rule mining (ARM) is a rule-based machine learning technique to discern interesting relations among factors with regard to datasets. The intention of ARM is to determine frequent item sets, and the interestingness of each finding is evaluated and used to reduce the set of rules (Huang et al. 2019). Interestingness measures are estimated using two main factors support count and confidence level. Based on these, decision can be made to determine which findings are the unique, beneficial and well-supported by the data. Support count is the sign of how often the items appear within the dataset. Confidence level denotes the number of occasions the if–then statements are observed true in the dataset. Commonly association rule mining can handle the binary dataset. However, for real-world application with the diverse group of dataset induces the need for a sophisticated algorithm. This proposed work use fuzzy-based ARM to cope with the medical dataset.

3.5.2 Fuzzy-based association rule mining

The enhanced decision support system proposed in this article builds a prediction model using fuzzy-based association rule mining. It is form of computational intelligence technique to discover the hidden patterns of the clinical data that are accrued continuously via health examination and medical treatment. Fuzzy logic is deliberate to solve problems in the similar manner that human thinks (Alayon et al. 2007; Paul et al. 2017). Recent studies show that fuzzy systems are progressively applied in several applications in medicine (LaFreniere et al. 2016). A fuzzy system is a rule-based system where fuzzy logic is imposed for representing varied styles of knowledge about a problem, and also for modeling the relationship and interactions that exist amid the variables (Delgado et al. 2003). A Fuzzy association rule is an implication of the form,

$$ \varvec{if} \left( {\varvec{A},\varvec{X}} \right) \varvec{then} \left( {\varvec{B},\varvec{Y}} \right){\mathbf{Where}}, {\mathbf{A}} {\mathbf{and}} {\mathbf{B}} \to {\mathbf{disjoints}} {\mathbf{item}} {\mathbf{sets}} {\mathbf{and}} {\mathbf{X}} {\mathbf{and}} {\mathbf{Y}} \to {\mathbf{fuzzy}} {\mathbf{sets}}. $$

The support-confidence framework can also be applied to fuzzy-based association rule mining through fuzzy support (significance) values. Fuzzy Support count (FS) is typically calculated as in Eq. (1).

$$ {\text{FS}} \left( A \right) = \frac{{ \mathop \sum \nolimits_{i = 1}^{n } \mathop \prod \nolimits_{{\forall \left[ {i\left[ l \right]} \right] \in A}} t^{{\prime }} \left[ {i\left[ l \right]} \right]}}{n} $$

(1)

where $ {\text{A}} = \left\{ {a_{1} ,a_{2} ,a_{3} , \ldots ,a_{\left| A \right|} } \right\} \to $ set of property attribute-fuzzy set. Record $ t_{i}^{'} $ satisfies A if $A \subseteq t_{i}^{'}$. The individual vote per record is found by multiplying the membership degree with an attribute-fuzzy set pair $ \left[ {i\left[ l \right]} \right] \in A $

$$ {\text{Vote for }}t_{i}\,{\text{satisfying }}A = \mathop \prod \limits_{{\forall \left[ {i\left[ l \right]} \right] \in A}} t^{\prime}\left[ {i\left[ l \right]} \right] $$

(2)

Fuzzy Confidence (FC) is calculated in the same manner that confidence is calculated in classical ARM (Eq. 3).

$$ {\text{FC}}\left( {A \to B} \right) = \frac{{{\text{FS}} \left( {A \cup B} \right)}}{{{\text{FS}} \left( A \right)}} $$

(3)

The algorithm for fuzzy-based association rule comprises of four major steps (Table 3):

Table 3 Algorithm for fuzzy-based Association Rule Mining

Full size table

1.
Convert the medical transactional data set $ \left( T \right) $ into a property data set $ \left( {T^{P} } \right) $.
2.
Convert the property data set $ \left( {T^{p} } \right) $ into a fuzzy data set $ \left( {T^{f} } \right). $
3.
Apply an a priori style fuzzy association rule mining algorithm to $ \left( {T^{f} } \right) $ using fuzzy support, confidence and correlation measures to generate a set of frequent item sets F.
4.
Process F and generate a set of fuzzy association rules R such that $ \forall r \in R $ the certainty factor (either confidence or correlation as desired by the end user) is above some user specified threshold.

4 Implementation

This section elaborates the implementation of the proposed work. The performance of the proposed enhanced model for prediction and prevention is evaluated using real-time medical data. The data samples used for training and testing the proposed system are gathered from the Primary Health Centers with the support from Tamil Nadu Health System Project—Non Communicable Disease wing. The medical dataset comprises 599 samples with 21 attributes, inclusion of the class attribute. The samples are categorized into four classes, namely 238—normal hypertension (Normal), 321—pre-hypertension (preHT), 20—newly diagnosed hypertension (newHT), and 20—known hypertension (KHT). The imbalance ratio (I_R) for the dataset is attained by solving Eq. (4). Based on this ratio value, the proportions of all classes can be determined.

$$ \varvec{I}_{\varvec{R}} = \frac{{\varvec{N}_{\varvec{c}} - 1}}{{\varvec{N}_{\varvec{C}} }}\mathop \sum \limits_{{\varvec{i} = 1}}^{{\varvec{N}_{\varvec{c}} }} \frac{{\varvec{I}_{\varvec{i}} }}{{\varvec{I}_{\varvec{n}} - \varvec{I}_{\varvec{i}} }}\varvec{ } $$

(4)

$ \begin{aligned} \varvec{where\,N}_{\varvec{c}} \to \varvec{No\,of\,classes},\varvec{ I}_{\varvec{i}} \to \varvec{No\,of\,instances\,of\,i}^{{\varvec{th}}} \varvec{ class},\varvec{ I}_{\varvec{n}} \to \varvec{total\,no\,of\,instances}. \hfill \\ \varvec{ I}_{\varvec{R}} \to \varvec{range\,from }\left( {1 \le \varvec{I}_{\varvec{R}} < \infty } \right).\varvec{ If I}_{\varvec{R}} = 1 \to \varvec{completely\,balanced\,dataset} \hfill \\ \end{aligned} $

4.1 Classifier evaluation metrics

The learning model overall performance is generally drawn in step with the confusion matrix related to the each classifier. Then the confusion matrix for one among the categories may have the structure as with Table 4.

Table 4 Confusion matrix for binary classification problems

Full size table

The column in the table specifies the actual class dispersion in the dataset, whereas the row indicates the predicted classes. The TP and TN entries in the matrix furnish the quantity of positive cases correctly identified as positive and the quantity of negative instances classified as negative, respectively. Alternatively, FP and FN imply the number of instances misclassified as negative and positive instances, respectively (Sokolova and Lapalme 2009; Hossin and Sulaiman 2015). Various evaluation metrics like accuracy, precision, recall and F-measure are computed based on the entries in the matrix. Equations (5), (6), (7), and (8) represent the different performance metrics.

$$ {\text{Precision}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FP}}}} $$

(5)

$$ {\text{Recall}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} $$

(6)

$$ F - {\text{Measure}} = \frac{{2 \times {\text{precision}} \times {\text{recall}}}}{{{\text{precision}} + {\text{recall}}}} $$

(7)

$$ {\text{Accuracy}} = \frac{{{\text{TN}} + {\text{TP}}}}{{{\text{TN}} + {\text{TP}} + {\text{FN}} + {\text{FP}} }} $$

(8)

4.2 Experimental setup

This segment portrays the illustration of the suggested enhanced model. The dirty data are pre-processed by eliminating missing values and noise. The pre-processed output is given to the learning model. The weights are assigned to the feature vectors, and it is normalized. Initially, classifier is created applying SVM classifier. Based on the classification error, the weights are adjusted to construct new classifier. This approach goes up to four iterations. Ultimately all the classifiers are merged to form a strong classifier. In this process the weights are updated intelligently in order to lessen the classification error. Using cascading operation the best classifier is placed on the top of the ladder so the testing data get classified correctly.

Eventually, the prediction model is build over the learning module. The goal of this analysis is to reveal the concealed patterns by considering the behavioral attributes. The proposed strategy will generate rules based on the frequent item sets. When compared to standard ARM, fuzzy-based association rule mining produces reduced set of rules. These rules more precisely reflect the true patterns in the data set. In this phase health-damaging behavioral attributes are taken into consideration. Table 5 demonstrates the attributes type and its name. The generated rules are accustomed to offer decision support for prevention. The computing device used for experimentation is Lenovo idea pad with 8 GB RAM. Python is used as a programming platform. The system is developed using Spyder Integrated Development Environment. Experimental analysis is also done using Google Colab.

Table 5 Health-damaging behavioral attributes for prediction of hypertension

Full size table

5 Result and discussion

This section portrays the outcome associated with the enhanced decision support model. The effectuation of the suggested intelligent learning model using boosting-based SVM classifier is extensively studied and compared with other learning techniques. The performance of the most learning algorithms relies on the class distribution. This proposed model can classify multi-class imbalanced dataset implicitly, avoidance explicit data sampling process. Overall 599 samples are used for experimentation and out of which 420 instances are taken for training and 179 are used for testing, as per 70% and 30% split.

The performance comparisons of the diverse classifier are furnished in Table 6. Figures 4 and 5 show that compared to other algorithms SVM shows better performance. Then the performance of the recommended model is comparatively studied with traditional SVM classifier for both imbalance and balanced data and are depicted in Table 7. As per Fig. 6, it is interesting to note an enhancement in overall performance is recorded for proposed intelligent learning model.

Table 6 Comparitive performance analysis of learning models with balanced data and imbalanced dataset

Full size table

Table 7 Comparitive analysis of performance of proposed model with SVM classifier

Full size table

The next set of experiment is to investigate the performance of the exploration phase. The class-wise classified dataset is given to the predictive model. This traditional dataset is converted into the fuzzy dataset. A priori style association rule mining is applied to derive the frequent item sets. Based on the intelligent membership function, set of fuzzy rules are generated. By varying the fuzzy membership function, different performance can be achieved as shown in Fig. 7. This proposed model achieves overall accuracy of 86%. Some example fuzzy association rules generated have the form:

IF smoking is Yes AND diet is Nonveg AND physical-activity is Yes THEN hypertension is Normal

IF smoking is Yes AND physical-activity is NO THEN hypertension is preHT

These rules would be useful in analyzing individual behavior patterns and to make need lifestyle modification. Based upon the rules needed, decision can be taken to prevent the incidence of hypertension.

6 Conclusion and future work

The key objective of this study is to promote an enhanced decision support system particularly to handle hypertension. As the nature of data impact, the learning model’s performance, this paper puts forward a combined approach of ensemble classifiers. The proposed intelligent learning model generates a strong classifier which can learn from the improper class dispersion and own high computational efficiency. The comparative analysis report proves an improved accuracy and a better inferring of the learned concept. The experiment outcome demonstrates the proposed strategy affords improved performance in medical diagnosis. The enhanced predictive version was built upon the classification solution using fuzzy systems. This model generates rules by uncovering frequent pattern. The predictive model reveals far better predictive accuracy. Based on these rules, legitimate decisions could be made to prevent hypertension. Moreover, this research considers diverse peril factors like personal behavioral information and past medical history for prediction. Thus, precise decision could be made at the proper time as prevention plays a vital role to reduce the incidence of hypertension. As this system deals with medical data, still it needs enhancement in handling classification errors. As a future work, this learning model could be streamlined by focusing misclassification patterns employing optimization techniques. Further, this model can be extended as personalized decision support system and serve as an eye toward personalized medicine.

Change history

19 August 2024
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s00500-024-10067-9

References

Alayon S, Robertson R, Warfield SK, Ruiz-Alzola J (2007) A fuzzy system for helping medical diagnosis of malformations of cortical development. J Biomed Inform 40(3):221–235. https://doi.org/10.1016/j.jbi.2006.11.002
Article Google Scholar
Alexandropoulos SN, Kotsiantis SB, Vrahatis MN (2019) Data preprocessing in predictive data mining. Knowl Eng Rev 34:1–33. https://doi.org/10.1017/s026988891800036x
Article Google Scholar
Almuhaideb S, Menai MEB (2016) Impact of preprocessing on medical data classification. Front Comput Sci 10:1082–1102. https://doi.org/10.1007/s11704-016-5203-5
Article Google Scholar
Benhar H, Idri A, Fernandez-Aleman JL (2019) A systematic mapping study of data preparation in heart disease knowledge discovery. J Med Syst. https://doi.org/10.1007/s10916-018-1134-z
Article Google Scholar
Bloom DE, Cafiero-Fonseca ET, Candeias V, Adashi E, Bloom L, Gurfein L, Jané-Llopis E, Lubet, A., Mitgang E, Carroll O’Brien J, Saxena A (2014) Economics of non-communicable diseases in india: the costs and returns on investment of interventions to promote healthy living and prevent, treat, and manage NCDs. In: World economic forum, Harvard School of Public Health. http://www3.weforum.org/docs/WEF_EconomicNonCommunicableDiseasesIndia_Report_2014.pdf
Bzdok D, Altman N, Krzywinski M (2018) Point of significance: statistics versus machine learning. Nat Methods 15:233–234
Article Google Scholar
Chatterjee S, Das A (2019) A novel systematic approach to diagnose brain tumor using integrated type-II fuzzy logic and ANFIS (adaptive neuro-fuzzy inference system) model. Soft Comput. https://doi.org/10.1007/s00500-019-04635-7
Article Google Scholar
Chen S, Xu J (2015) Least squares twin support vector machine for multi-class classification. Int J Database Theory Appl 8(5):65–76. https://doi.org/10.14257/ijdta.2015.8.5.06
Article Google Scholar
Das S, Ghosh PK, Kar S (2013) Hypertension diagnosis: a comparative study using fuzzy expert system and neuro fuzzy system. In: 2013 IEEE International conference on fuzzy systems (FUZZ), pp 1–7
Delgado M, Marín N, Sánchez D, Vila MA (2003) Fuzzy association rules: general model and applications. IEEE Trans Fuzzy Syst 11:2
Article Google Scholar
Gamberger D, Lavrac N, Dzeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell Int J 14(2):205–223. https://doi.org/10.1080/088395100117124
Article Google Scholar
Georga EI et al (2019) Artificial intelligence and data mining methods for cardiovascular risk prediction. In: Golemati S, Nikita K (eds) Cardiovascular computing—methodologies and clinical applications. Series in bioengineering. Springer, Singapore
Google Scholar
Goli A, Safdari R, Rezaeizadeh H, Abbassian A, Mokhtaran M, Hossein Ayati M (2019) A systematic literature review and classification of knowledge discovery in traditional medicine. Comput Methods Prog Biomed 168:39–57. https://doi.org/10.1016/j.cmpb.2018.10.017
Article Google Scholar
Grzymala-Busse JW, Hu M (2001) A comparison of several approaches to missing attribute values in data mining. Lect Notes Artif Intell 2005:378–385. https://doi.org/10.1007/3-540-45554-x_46
Article Google Scholar
Guzman JC, Melin P, Prado-Arechiga G (2017) Design of an optimized fuzzy classifier for the diagnosis of blood pressure with a new computational method for expert rule optimization. Algorithms 10:79. https://doi.org/10.3390/a10030079
Article Google Scholar
Guzmán JC, Melin P, Prado-Arechiga G (2015) Design of a fuzzy system for diagnosis of hypertension. In: Melin P, Castillo O, Kacprzyk J (eds) Design of intelligent systems based on fuzzy logic, neural networks and nature-inspired optimization studies in computational intelligence, vol 601. Springer, Cham
Google Scholar
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239. https://doi.org/10.1016/j.eswa.2016.12.035
Article Google Scholar
Hayrinen K, Saranto K, Nykanen P (2008) Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inform 77:291–304
Article Google Scholar
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284. https://doi.org/10.1109/TKDE.2008.239
Article Google Scholar
Horsky J, Aarts J, Verheul L, Seger DL, Van Der Sijs H, Bates DW (2017) Clinical reasoning in the context of active decision support during medication prescribing. Int J Med Inform 97:1–11. https://doi.org/10.1016/j.ijmedinf.2016.09.004
Article Google Scholar
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manage Process. https://doi.org/10.5121/ijdkp.2015.5201
Article Google Scholar
Huang M, Sung H, Hsieh T, Wu M, Chung S (2019) Applying data-mining techniques for discovering association rules. Soft Comput. https://doi.org/10.1007/s00500-019-04163-4
Article Google Scholar
India State-level Disease Burden Initiative Collaborators (2017) Nations within a nation: variations in epidemiological transition across the states of India from 1990 to 2016 in the Global Burden of Disease Study. Lancet
Jain D, Singh V (2019) A two-phase hybrid approach using feature selection and adaptive SVM for chronic disease classification. Int J Comput Appl. https://doi.org/10.1080/1206212x.2019.1577534
Article Google Scholar
Jassim FA (2013) Image denoising using interquartile range filter with local averaging. Int J Soft Comput Eng 2(6):424–428
Google Scholar
Kotsiantis SB, Kanellopoulos D, Pintelas PE (2006) Data preprocessing for supervised leaning. Int J Comput Sci 1(2):111–117
Google Scholar
Krawezyk B, Wozniak M (2015) Hypertension type classification using hierarchical ensemble of one-class classifiers for imbalanced data. Adv Intell Syst Comput 311:341–349. https://doi.org/10.1007/978-3-319-09879-1_34
Article Google Scholar
Krittanawong C, Zhang H, Wang Z, Aydar M, Kitai T (2017) Artificial intelligence in precision cardiovascular medicine. J Am Coll Cardiol 69:2657–2664
Article Google Scholar
Krittanawong C, Andrew S, Baber U, Bangalore S, Franz H (2018) Future direction for using artificial intelligence to predict and manage hypertension. Curr Hypertens Rep 20:75
Article Google Scholar
Kublanov VS, Dolganov AY, Belo D, Gamboa H (2017) Comparison of machine learning methods for the arterial hypertension diagnostics. Appl Bionics Biomech 13, Article ID 5985479. https://doi.org/10.1155/2017/5985479
LaFreniere D, Zulkernine F, Barber D, Martin K (2016) Using machine learning to predict hypertension from a clinical dataset. IEEE Symp Ser Comput Intell 1:7. https://doi.org/10.1109/ssci.2016.7849886
Article Google Scholar
Ministry of Health and Family Welfare Government of India (2017) National multisectoral action plan for prevention and control of common non-communicable diseases (2017–2022). Supported by WHO
Ministry of Home Affairs (2010) Report on causes of deaths in India 2001–2003, Office of the Registrar General of India, Govt. of India. http://www.cghr.org/wordpress/wp-content/uploads/Causes_of_death_2001-03.pdf
Moreira MWL, Rodrigues JJPC, Korotaev V, Al-Muhtadi J, Kumar N (2019) A comprehensive review on smart decision support systems for health care. IEEE Syst J. https://doi.org/10.1109/JSYST.2018.289012
Article Google Scholar
National Institute of Medical Statistics, Indian Council of Medical Research (ICMR) (2009) IDSP Non-communicable disease risk factors survey, phase-I states of India, 2007–08. National Institute of Medical Statistics and Division of Non-Communicable Diseases, Indian Council of Medical Research, New Delhi, India, 2009. http://www.icmr.nic.in/final/IDSP-NCD%20Reports/Phase-1%20States%20of%20India.pdf
Non communicable Diseases Progress Monitor (2017) Geneva: World Health Organization. Licence: CC BY-NC-SA 3.0 IGO
Parikh NI, Pencina MJ, Wang TJ, Benjamin EJ, Lanier KJ, Levy D, D’Agostino RB, Kannel WB, Vasan RS (2008) A risk score for predicting near-term incidence of hypertension: the framingham heart Study. Ann Internal Med 148(2):102–110
Article Google Scholar
Paul AK, Shill PC, Rabin MRI, Murase K (2017) Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease. Appl Intell 1:1. https://doi.org/10.1007/s10489-017-1037-6
Article Google Scholar
Pereboom M, Mulder IJ, Verweij SL, van der Hoeven RTM, Becker ML (2019) A clinical decision support system to improve adequate dosing of gentamicin and vancomycin. Int J Med Inform. https://doi.org/10.1016/j.ijmedinf.2019.01.002
Article Google Scholar
Poulter NR, Prabhakaran D, Caulfield M (2015) Hypertension. Lancet (London) 386:801–812
Article Google Scholar
Prakash Upadhyay R (2012) An overview of the burden of non- communicable diseases in India. Iranian J Public Health 41(3):1–8
Google Scholar
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manage 45:427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Article Google Scholar
Srivastava P, Srivastava A, Burande A, Khandelwal A (2013) A note on hypertension classification scheme and soft computing decision making system. ISRN Biomath 2013:11
Article Google Scholar
Ting SL, Shum CC, Kwok SK, Tsang AHC, Lee WB (2009) Data mining in biomedicine: current applications and further directions for research. J Softw Eng Appl 2:150–159. https://doi.org/10.4236/jsea.2009.23022
Article Google Scholar
Tomar D, Agarwal S (2015) A comparison on multi-class classification methods based on least squares twin support vector machine. Knowl-Based Syst 81:131–147. https://doi.org/10.1016/j.knosys.2015.02.009
Article Google Scholar
WG3 (2017) (2): Non communicable diseases, Report of the Working Group on Disease Burden for 12th Five Year Plan (2012–2017). Ministry of Health & Family Welfare, Government of India, May 2011
WHO (2004) Mortality and burden of diseases estimates for who member states In 2004
Yan Y, Junhua H, Yongmei L, Xiaohong C (2019) A multiperiod hybrid decision support model for medical diagnosis and treatment based on similarities and three-way decision theory. Expert Syst. https://doi.org/10.1111/exsy.12377
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, SSN College of Engineering, Kalavakkam, Chennai, Tamil Nadu, 603110, India
M. Ambika & G. Raghuraman
Department of Information Science and Technology, CEG Campus, Anna University, Chennai, Tamil Nadu, 600025, India
L. SaiRamesh

Authors

M. Ambika
View author publications
You can also search for this author in PubMed Google Scholar
G. Raghuraman
View author publications
You can also search for this author in PubMed Google Scholar
L. SaiRamesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Ambika.

Ethics declarations

Conflict of interest

The authors declares that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s00500-024-10067-9"

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

About this article

Cite this article

Ambika, M., Raghuraman, G. & SaiRamesh, L. RETRACTED ARTICLE: Enhanced decision support system to predict and prevent hypertension using computational intelligence techniques. Soft Comput 24, 13293–13304 (2020). https://doi.org/10.1007/s00500-020-04743-9

Download citation

Published: 14 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00500-020-04743-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

RETRACTED ARTICLE: Enhanced decision support system to predict and prevent hypertension using computational intelligence techniques

Abstract

Similar content being viewed by others

A Hybrid Feature Selection Method to Classification and Its Application in Hypertension Diagnosis

Hypertension Type Classification Using Hierarchical Ensemble of One-Class Classifiers for Imbalanced Data

Ensemble method based predictive model for analyzing disease datasets: a predictive analysis approach

Explore related subjects

1 Introduction

1.1 Motivation

2 Background of the study

2.1 Hypertension and its risk factors

2.2 Artificial intelligence in hypertension

2.3 Research gap and objectives

3 Proposed methodology

3.1 The proposed enhanced decision support model

3.2 Data collection phase

3.3 Data preparation phase

3.4 Design and development of learning phase

3.4.1 Support vector machine (SVM)

3.4.2 Proposed intelligent learning model

3.5 Design and development of prediction phase

3.5.1 Association rule mining

3.5.2 Fuzzy-based association rule mining

4 Implementation

4.1 Classifier evaluation metrics

4.2 Experimental setup

5 Result and discussion

6 Conclusion and future work

Change history

19 August 2024

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation