Keywords

1 Introduction

One of the major causes of death in the world is cardiovascular disease [1]. According to WHO, 17.9 million people die in India due to cardiovascular diseases. Eighty-five per cent of these deaths were due to heart stroke. Both Indian men and women have high mortality rate due to heart diseases. Cardiovascular disease (CVD) is the manifestation of atherosclerosis with over 50% diametric narrowing of a major coronary vessel [2]. The infestation pattern can be localized or diffused. Frequently, regardless of the degree of severity, an endothelial dysfunction, even if the findings are localized angiographically often additional plaques are present, is shown on the angiogram as wall irregularities are impressive.

Basically, the diagnostic-methodical procedure is the same as for nondiabetic. The stress ECG, stress echocardiography, and SPECT myocardial scintigraphy are available as noninvasive diagnostic procedures [3]. Electron beam tomography (EBCT) or magnetic resonance imaging (MRI) or multi-line computed tomography (MSCT) are a lot of promising new imaging procedures that are not yet part of the official guideline recommendations for routine diagnostics of the cardiovascular heart disease. EBCT or MSCT can be used to detect coronary calculus, which is a sensitive marker for the early stages of cardiovascular disease. Although the MSCT has a high diagnostic potential for risk assessment [4], the severity of an underlying coronary obstruction has yet to be reliably detected. On the other hand, a negative calcium score includes cardiovascular disease and in particular unstable plaques.

A group of blood vessels and heart disorders [5] is the cardiovascular disease (CVD), including:

  • Cerebrovascular diseases: The blood vessels’ diseases that supplying the brain

  • Coronary heart disease: The blood vessels’ disease that supplying the cardiac and the muscle.

  • Congenital heart disease: This is the malformation of the heart that is present from birth I.

  • Rheumatic heart disease: It is caused due to the bacteria called nothing strep, which causes rheumatic fever and injuries to heart valves and heart muscle.

  • Peripheral arteriopathies: The blood vessels’ diseases supply to the lower and upper limbs.

  • Deep vein thrombosis and pulmonary embolisms: In the legs’ veins, the blood is clotted (thrombi), which lodge and dislodge dense (emboli) in the heart and lungs’ vessels.

  • Angina pectoris: It is a clinical syndrome that causes insufficient supply of blood to the heart muscle. Symptoms are sore throat, pain in back, nausea, chest pain, choking, salivation, sweating, and pressure pain Zion. These symptoms appear quickly and intensify with movement.

  • Cardiac infarction: A serious coronary disease is infarction with necrosis of the myocardium. Coronary stenosis develops first; it is suddenly blocked the pericardium, causing severe pain in the heart. The symptoms are tightness in the centre of the chest, a pain that radiates from the pericardium in chest, more and more frequent chest pains, constant pain in the upper stomach, shortness of breath, sweating, feeling of annihilation, fainting, nausea, vomiting.

  • Arrhythmia: The heart rhythm disorder when the coordination of the electrical activity of the heartbeat is not working properly, for what our heart begins to beat too fast, too slow, or irregularly. Arrhythmia may develop as a consequence of scarring. The heart muscle after a heart attack, diseases valves or coronary arteries of the heart, and abnormal thickness of the walls ventricular.

1.1 Risk Factors Causing Heart Disease

The most widely known cardiovascular risk factors are: tobacco, blood cholesterol, diabetes, high blood pressure, obesity, lack regular physical exercise (sedentary lifestyle), family history of cardiovascular disease and stress. In women, there are specific factors such as ova-polycystic rivers, oral contraceptives, and self-oestrogens. How much higher the level of each risk factor, the greater the risk of having a disease cardiovascular [6, 7].

Tobacco: Smoking increases the rate of heart, causing alterations in the heartbeat rhythm and constricts the main arteries. It makes the heart much harder. The blood pressure is also increased by smoking that increases the strokes’ risk. This indicates that a smoker has twice the risk of myocardial infarction than a nonsmoker.

Cholesterol in the blood: When cholesterol levels rise, usually due to an inadequate diet when high in saturated fat, too much cholesterol is transported to the tissues, form that these cholesterol molecules can be deposited in the arteries, on all those that are around the heart and brain and get to plug them thus deriving serious heart problems.

Diabetes: Whether there is insufficient insulin production or resistance to the action, and accumulation of glucose in the blood, accelerating the arteriosclerosis process, acute myocardial infarction, increasing the cardiovascular disease risk: angina, and sudden death due to cardiac arrest.

Blood pressure: It is the pressure with which the blood circulates through the blood vessels when it leaves the heart (systolic blood pressure known as high pressure) or when the heart fills up with blood returning to the heart (diastolic blood pressure: commonly known as low pressure). The appearance of hypertension usually indicates that there is a cardiovascular risk.

Obesity: Being overweight or obese exponentially increases the risk of suffering a cardiovascular disease. Currently, overweight and obesity are considered as important as other classic risk factors related to the disease coronary. Adipose tissue not only acts as a store for fat molecules, but also synthesizes and releases numerous hormones related to metabolism into the blood of immediate principles and the regulation of intake.

Stress: Mental stress induces endothelial dysfunction, increased blood viscosity, stimulated aggregation of platelets, and promoted arrhythmogenesis by haemoconcentration and stimulates factors involved in inflammation. Cardiovascular hyper-responsiveness from blood pressure to mental stress tests, performed in the laboratory, has been related to the risk of improving high blood pressure.

Age: With age, the activity of the heart results is to be deteriorated. Thus, the heart walls’ thickness can be increased, and the flexibility of arteries may lose. Before the body muscles, the heart can’t pump the blood efficiently if this happens. The risk of suffering from heart disease would be higher for older people. Due to heart disease, about 4 out of 5 deaths can be occurred in the people whose age is above 65 years.

Sex: Men are generally at higher risk of having a seizure than women to the heart, but this difference is small if women started the phase of menopause. One of the feminine hormones, oestrogen, helps to protect the women from heart disease based on the researchers. The same cardiovascular risk is there for both women and mean after 65 years of age.

This paper presents a risk factor analysis of heart disease. The goal is to analyse the causes of the heart attacks to individual patients. This would help the doctors treat individual patients with targeted approaches. This increases the chances of their survival.

2 Literature

Several researchers have attempted to solve the problem of heart disease classification using machine learning algorithms. Linear regression was used to analyse the heart diseases in [8, 9]. K-means clustering has been used in [10]. Authors used K-nearest neighbour for heart disease prediction in [11, 12]. Feature extraction-based techniques have been presented in [13,14,15]. Support vector machine (SVM) has been used in [11, 16]. Decision tree-based heart disease classification is used in [8, 11, 17, 18]. Deep learning techniques have been used in [18,19,20,21,22].

MontherTarawneh et al. [23] have proposed a new heart disease prediction system with the integration of all techniques into the single algorithm, known as hybridization. The accurate diagnosis results are achieved based on the combined model from all methods.

IlliasTougui et al. [24] have studied the comparison of six data mining tools based on ML techniques to predict the heart disease. The artificial neural network was proved as the best method with the use of MATLAB simulation tool.

Ching-seh Wu et al. [25] have studied different classifiers and effect of data processing methods based on the experimental analysis. The higher accuracy results were achieved with the Naïve Bayes and logistic regression based on random forest and decision tree algorithms.

Mohammad Shafenoor Amin et al. [26] have made researches on detection of significant features and data mining methods for improving the prediction accuracy of cardiovascular disease. The accuracy of 87.4% in heart disease prediction is achieved using the best data mining technique, known as vote.

Nowbar et al. [27] have discussed the leading cause of death that is remained as IHD in all income groups of countries. The higher risk factors of cardiovascular issues have been contributed by the globalization in developing countries.

Bechthold, Angela, et al. [28] have showed the important lower risk of CHD, HF, and stroke with the optimal intake of vegetables, fruits, and nuts. The results proved that the key component of deriving the dietary food-based guidelines is used to prevent the CVD.

Kai Jin et al. [29] have described that the interventions of telehealth have the potential of improving the cardiovascular risk factors. The current evidence practice gap could be narrowed down to attend the centre-based cardiac rehabilitation.

Khandaker et al. [30] have been evident that the comorbidity between CHD and depression raised from the shared environmental factors. For CHD, the risk factors are CRP, IL-6, and triglycerides linked with depression causally.

Pencina et al. [31] have demonstrated the cardiovascular risk models and PAFs decrease with age. The potential and absolute risk reductions increase with age with respect to the treatment.

Latha and Jeeva [32] have investigated the ensemble classification method to enhance the prediction accuracy for heart disease. The detection of heart disease risk factors exhibits the satisfactory results based on the ensemble techniques like boosting and bagging.

Haoxiang et al. 33] presented a solution for data security optimal geometric transformation. Chen, et al., (2021) [34] presented a method for early detection of CAD for a higher accurate value. Table 1 shows a few literature analysis.

Table 1 Literature analysis

3 Proposed Method

The notion of data mining varies according to the literature. The definition of this concept can range from the extraction of patterns to the overall process of extraction of knowledge from data. Data mining is the process of extracting the knowledge from data. Its goal is to find patterns linking the data together, knowing that they must be interesting and for that new useful and nontrivial. The image regularly used is that of the mountain in which nuggets are buried. Data mining therefore consists in extracting them despite the vastness of the mountain.

The algorithmic approach aims to extract association rules. From a transaction basis, it first looks for frequent reasons that satisfy threshold conditions. Depending on the amount of data and the desired goals, many algorithms are studied, some of which are presented in this document. From these patterns, association rules are extracted, allowing links to be made enter the items that compose them. They are characterized by measures of interest, putting highlights certain aspects.

The proposed association rule mining approach is implemented on the dataset parameters to analyse the risk each patient has. The risk factor analysis is useful to identify the cause of heart attack to the patient. This approach can later be used to predict the factor that can cause heart attacks in individual patients. The proposed framework is implemented based on the rules of support confidence and lift.

Figure 1 shows the proposed system architecture. The input parameters are read from the database. Each individual parameter is taken for consideration, and the abnormal values in the list are selected. The confidence and support values for the remaining abnormal parameters are calculated, and risk assessment is performed.

Fig. 1
figure 1

Proposed system

Algorithm

Step 1: Read the input data.

Step 2: Calculate the confidence value with each individual parameter present in the dataset which have caused heart attack.

Step 3: Calculate the support value with each individual parameter present in the dataset which have caused heart attack.

Step 4: Calculate the overall risk factor by combining the confidence and support metric for individual patients.

Initially, two communities approached data mining differently. On the one hand were the supporters of information visualization, the aim of which was to give the user a general view of the data, while allowing a detailed view. On the other hand, the defenders of the algorithmic approach argued from the sufficiency of statistical and learning methods to find interesting patterns. Today, even if these two philosophies still exist, a third approach has emerged by the search for patterns by combining the visual approach and the algorithmic approach. There are four recommendations for the development of future knowledge research systems:

A mining is carried out automatically, manually, or by a combination of these two approaches.

3.1 Association Rule

A rule of association is defined, from an item set I, by the relationship in Eq. (1):

$$ X \Rightarrow Y $$
(1)

where X ∪ Y = I and X ∩ Y = ∅.

This can be translated as: “If X is present in the transaction, then Y is also”. Note that X and Y can be composed of several attributes, but an attribute cannot appear simultaneously in both parts of the rule. The left part of the ruler is called the premise or antecedent or body. The right part is called the conclusion or the consequent or the head. Support describes the probability that both the body and the head of the rule are in of a transaction as measured by all transactions. Support is a measure that describes the frequency with which a rule is in the database occurs. Both terms, support and coverage, describe exactly the same thing. But also want to provide a generally applicable definition of support in Eq. (2).

$$ \sup \left( {A \Rightarrow B} \right) = \frac{{\left| {\left\{ {t \in D{|}\left( {A \cup B} \right) \subseteq t} \right\}} \right|}}{\left| D \right|} $$
(2)

where A is the body of the rule, B is the rule header, t is a transaction, and D is the transaction database. Since the support is a probability, it can have values between 0 and 1 take.

For example, the support usually shoes → socks 2%. It can be seen that both shoes and socks come together in one transaction 20,000 times before. Measured against all transactions, here 1,000,000, that's exactly 2%. Mathematically, it looks like this for the example described above:

$$ {\text{sup }}\left( {{\text{shoes}} \to {\text{socks}}} \right) = {\text{p}}\;({\text{shoes}}\, \cup \,{\text{socks}}) $$

In the dataset, the support value identifies the combination of parameters that can cause heart attack. For each individual parameter, the remaining parameters which are abnormal are identified, and the support value is calculated.

The confidence of a rule is defined as the proportion of transactions that Rule body and rule header contain the amount of transactions that meet the rulebook.

The confidence c of the association rule A⇒ B is defined by Eq. (3):

$$ c\left( {A \Rightarrow B} \right) = \frac{{s\left( {A \cup B} \right)}}{s\left( A \right)} $$
(3)

For our example, the confidence is 10%. That's the 20,000 transactions in which both shoes and socks occur divided by the 200,000 transactions in which only appear in shoes. It is the conditional probability of A given B rule A → B. The formal definition looks like in this Eq. (4):

$$ {\text{conf}}\left( {A \Rightarrow B} \right) = \frac{{\left| {\left\{ {t \in D|\left( {A \cup B} \right) \subseteq t} \right\}} \right|}}{{\left| {\left\{ {t \in D|A \subseteq t} \right\}} \right|}} = \frac{{\sup \left( {A \Rightarrow B} \right)}}{\sup \left( A \right)} $$

Or

$$ {\text{conf}}\left( {A \Rightarrow B} \right) = p\left( {B|A} \right) $$
(4)

As regards the basic measure characterizing an association rule, it can be assimilated the conditional probability P (B | A), that is to say the probability of Y knowing X. The trust can therefore be written as follows Eq. (5):

$$ c\left( {A \Rightarrow B} \right) = \frac{{P\left( {AB} \right)}}{P\left( A \right)} $$
(5)

Confidence is intended to measure the validity of an association rule. The confidence value presents a ratio of the intersection of individual parameters abnormal and the total number of abnormal cases.

4 Experimental Results

For experimental analysis, UCI heart repository has been used (https://archive.ics.uci.edu/ml/datasets/heart+disease). The repository has 13 parameters, namely Age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, and thal. The characteristics to demographic data, including behaviour, medical history, clinical examinations and are described as follows:

  • Age in years

  • Cp—type of chest pain

  • Ca—number of major vessels (0–3) coloured based on fluoroscopy

  • Exang—angina induced the exercise (1 = yes; 0 = no)

  • Thalach—achieved maximum heart rate, > 100 abnormal

  • Fbs—fasting blood sugar (> 120 mg/dl) (1 = true; 0 = false)

  • Chol—cholesterol, > 200 abnormal

  • Trestbps—resting blood pressure (in mm Hg) > 120 abnormal

  • Sex—1 that indicates male and 0 indicates the female

  • Thal—type of defect

Each attribute can fall under a specific category. The characteristics of which their values can take any value within a range called continuous. Table 2 shows the abnormal association value between cp and exang, thalach, restecg, fbs, chol, trestbps, thal. In the table, column 1 shows Support, column 2 shows Confidence, and column 3 shows the Lift.

Table 2 Abnormal association for cp

From Table 2, the association between cp, chol, and thalach has very high association. Thal also has good association with cp.

Table 3 shows the abnormal association value between trestbps and cp, ca, restecg, chol, thal, exang, restecg, thalach, and fbs. In the table, column 1 shows Support, column 2 shows Confidence, and column 3 shows the Lift.

Table 3 Abnormal association for trestbps

Table 3 shows that trestbps has high confidence with thalach. The remaining parameters have not contributed to the occurrence of the heart attack in combination with this parameter,

Table 4 shows the abnormal association value between chol and cp, ca, fbs, thal, exang, restecg, trestbps, thalach, thal. In the table, column 1 shows Support, column 2 shows Confidence, and column 3 shows the Lift.

Table 4 Abnormal association for chol

Table 5 shows the abnormal association value between fbs and cp, thal, ca, exang, restecg, thalach, chol, and trestbps. In the table, column 1 shows Support, column 2 shows Confidence, and column 3 shows the Lift.

Table 5 Abnormal association for fbs

Table 6 shows the abnormal association value between restecg and cp, thal, exang, thalach, chol, trestbps, chol, ca, and thal. In the table, column 1 shows Support, column 2 shows Confidence, and column 3 shows the Lift.

Table 6 Abnormal association for restecg

Table 7 shows the abnormal association value between thalach and cp, thal, chol, exang, trestbps, and ca. In the table, column 1 shows Support, column 2 shows Confidence, and column 3 shows the Lift.

Table 7 Abnormal association for thalach

Table 8 shows the abnormal association value between exang and cp, fbs, thalach, trestbps, thal, chol, and restecg. In the table, column 1 shows Support, column 2 shows Confidence, and column 3 shows the Lift.

Table 8 Abnormal association for exang

Table 9 shows the abnormal association value between Ca and cp, thal, thalach, exang, restecg, chol, fbs, and trestbps. In the table, column 1 shows Support, column 2 shows Confidence, and column 3 shows the Lift.

Table 9 Abnormal association for ca

Table 10 shows the abnormal association value between thal and cp, restecg, fbs, ca, exang, thalach, and chol. In the table, column 1 shows Support, column 2 shows Confidence, and column 3 shows the Lift.

Table 10 Abnormal association for thal

Based on the Tables 2, 3, 4, 5, 6, 7, 8, 9, and 10, the relationship with the risk of getting heart attack due to individual parameter and the combination of parameters can be analysed. Thalach is the parameter with highest association will other parameters in the abnormal range. The second and third highest parameters are found out as CP, Chol, respectively.

5 Conclusion

Association rules can be applied to extract the relationship in the data attributes. Here the parameters responsible for heart attack are studied in detail. The risk factor analysis on the attributed can be performed with support and confidence values. Support describes the probability that both the body and the head of the rule are in of a transaction as measured by all transactions. The confidence of a rule is defined as the proportion of transactions that rule body and rule header contain the number of transactions that meet the rulebook. The experimental results show that the thalach is the parameters with highest association will other parameters in the abnormal range. The second and third highest parameters are found out as CP, Chol, respectively. The future scope would include targeted treatments for patients based on their individual risk factor analysis report. This would make the treatment more effective and increase their survival chances.