1 Introduction

The Strong Bayesian association rules are extracted using the Weighted Bayesian Association rule Mining Algorithm (WBAR) was already designed and implemented with outperforming results [1, 2]. In this paper, Predictive Modeling concepts play a crucial role in developing a new algorithm for the medical support system with enormous medical records [3]. Unfortunately, patients' records are not thoroughly mined for effective decision-making to discover hidden patterns [4]. So to analyze medical records, advanced data mining approaches show significant results in the research field, finally contributing to a more accurate and high-performance medical decision support system. Sometimes clinical and treatment decisions are taken on the ground of a doctor's experience and knowledge, despite the inside, which can be extracted from a rich, substantial medical database [5]. And also, due to redundant and interrelated symptoms in medical diagnosis, physicians may fail to diagnose it accurately. Unfortunately, at the early stage, accurate diagnosis of the disease is quite challenging due to interdependence on various features [6].

A Fuzzy clinical decision support system (CDSS) based on a Bayesian belief network (BBN) is proposed, which can support medical staff or any experts with knowledge of patient-specific information to excavate and represent the hidden information when required intelligently [7]. But uncertainty always occurs in every building phase of the decision support process. Uncertain sources are like patients lacking in describing their sufferings accurately, degree of errors in laboratory reports, doctors or nurses sometimes fail to examine precisely their detection results, and it becomes harder to determine one's prognosis. Therefore with machine learning techniques, more advanced and accurate decision support systems should be implemented to adapt to a new environment and implicitly learn from instances. So to build CDSS, various methodologies can be incorporated to predict, assess, and extract information like statistical methods, data mining techniques, Soft computing techniques, and many more can be included, and significant research should be done in academic and practical areas. But several misconceptions arise to tamper with the accuracy of CDSS in the medical field, like representation and interpretation of clinical attributes under uncertainty which need a lot of refined methodology and techniques. So to handle this uncertainty, the current work proposes a new model known as Fuzzy Weighted Bayesian Belief Network (FWBBN) CDSS with new formulas and algorithms. The main contribution of the proposed framework are as follows:

  • Usage of Fuzzy Logic to deal with sharp boundaries, vagueness, and imprecision in medical attributes [8].

  • Weight assignment method on medical dataset attributes [2].

  • And to find the interdependence among attributes and to generate well-built rules, association rule mining is applied.

  • A hybrid novel approach is anticipated, incorporating fuzzy weighted association rule mining rules to build a Bayesian belief network.

The following is the workflow of the research proposal; Sect. 2 briefly points to the related work in tabulated form. Section 3 focuses on research methodology with new formulas and the Fuzzy Weighted Bayesian Association Rule(FWBAR) algorithm; Sect. 4 covers results and discussion; Sect. 5 shows the comparative study; Sect. 6 concludes the work with future scope.

2 Background work

Various soft-computing techniques, including data mining techniques, are surveyed, especially fuzzy logic, weight assignment methods, Association rule mining, and Bayesian belief network. Here Table 1 demonstrates relevant review findings of these techniques used in the clinical domain for building a predictive model are reviewed in the literature.

Table 1 Significant review findings of various soft computing methods used in the Clinical domain predictive models

From the exhaustive literature survey and its relevant finding, the gap is identified to work on the dataset's attributes as attributes have extraordinary importance with sharp boundary problems and are interdependent with some association levels. So to find out the impact of attributes and their interdependencies, a novel idea is proposed in the following section.

3 Methodology

The method of the proposed research work is elaborated using the following proposed algorithm as framed in Fig. 1.

Fig. 1
figure 1

Fuzzy WBAR Algorithm

This approach incorporates fuzzy theory with the WBAR mining algorithm [1]. The previous paper discussed the basic concept of the Bayesian belief network, Association rule mining, and types of weight assignments [1]. In this paper fuzzy approach will be incorporated to enhance the accuracy. The fuzzy model is a valuable technique for discovering the presence of imprecision in data patterns and understanding data semantics [30]. The study and experiments are done using a breast cancer dataset and other clinical datasets extracted from the University of California Irvine(UCI) machine learning repository via LUCS-KDD DN software [2, 31].

3.1 Fuzzy property of quantitative attribute

Association Rule Mining (ARM) model plays a significant role in dealing with quantitative data in many applications like temperature, pressure, etc., which are very common [32]. Discretization is needed in an ARM to convert quantitative data into the nominal domain. Here to deal with this, the Apriori-type method is used. Thus, association rule P → Q gives a relationship between nominal values of data items. Consider an example like "(FamilyHistory, yes), (Obesity, severe) → (Diabetics, yes)” [9]. These mined results are affected by partitioned intervals called "Sharp Boundary", particularly for data values near interval boundaries. Numbers of quantitative parameters which suffers from sharp boundary problem are present in the medical field. Consider an attribute Smoking in a particular record of a patient where the Smoking frequency per day is 11 then according to following discretization rules, Smoking [1,2,3] → LungCancer = " Low", Smoking   [2,3,4,5] → LungCancer = " Moderate”, Smoking   [4,5,6,7,8,9,10] → LungCancer = " High”, Smoking  [9-*] → LungCancer = " Severe". In this case, according to a sharp boundary, the patient falls in the severe cancerous zone, which will not give the correct result. Here comes the role of fuzzy logic, using which the patient will partially belong to the different fuzzy sets. Therefore the patient membership value to the fuzzy set should be for example (µ (LungCancer, “low”) = 0.01, µ (LungCancer, “moderate”) = 0.02, µ (LungCancer, “high”) = 0.3) µ (LungCancer, “severe”) = 0.67). Due to the impact of the sharp boundary problem on the quantitative attribute in the ARM model [4], a new idea is proposed known as the Fuzzy Weighted based ARM Algorithm. Then the redefined framework is proposed as Fuzzy Weighted Support (FWS) and Fuzzy Weighted Confidence to adapt to a Fuzzy environment. In this proposed paper fuzzy membership value of each fuzzy set is calculated using the trapezoidal membership function as shown in Eq. 1.

$$\mathrm{F}(\mathrm{x}:\mathrm{a},\mathrm{b},\mathrm{c},\mathrm{d}) =\left\{\begin{array}{l}0,x\le a\\ (x-a )/ b-a,a\le x\le b\\ (d-x )/ d-c,c\le x\le d\\ 0,x\ge d\end{array}.\right.$$
(1)

Table 2. shows the fuzzy values obtained for attributes using the trapezoidal membership function named D1. Here tabulation is done for a few attributes, and only five records are populated.

Table 2 Fuzzy values using the trapezoidal membership functions

These tabulated fuzzy values of attributes remove the sharp boundary problems present in the medical world. They can further be used to assign different weights using the automatic weight assignment method.

3.2 Weight assignment using maximum likelihood estimation method

After the fuzzification of attributes, the next step is calculating automated weights for each fuzzified value. Here weights are computed using the Maximum Likelihood Estimation (MLE) method [33]. MLE is a statistical method in which parameter estimation is done using probability distribution on the observed data. When enforced with a data set, MLE estimates the model's parameters. This technique discovers the estimate of a parameter which maximizes the probability of a particular observed value for a given training data model. The likelihood function is defined as Eq. 2:

$${L}(\mathrm{P}|{x}_1,{x}_2,\dots ..{x_n})=\prod_{i=1}^{n}f\left(\frac{xi}{P}\right),$$
(2)

where P is the initial probability of occurrence of a particular event.

L(P) is the likelihood value for probability value P.

x1,x2,…xn is the n instance of a given sample.

Here the calculation starts by finding a prior probability of a class label “yes” value using the training data set. The MLE is measured upon divergent probability values in the neighbouring locality of this prior probability, varying in slight offset amounts to compute the likelihood of the observed data with the highest value, i.e. the probability value for which the Likelihood estimation is maximum is assigned as the weight to that particular attributes. All the weights are calculated using the MLE technique, as shown in Table 3.

Table 3 Computation of Weights using MLE

In this proposal, novel modifications are done in the medical domain to construct BBN with improved prediction accuracy by fuzzyfing quantitative medical attributes and then applying weights. Hence the core problem is to define the terms and new concepts to build Fuzzy Weighted BBN.

3.3 Fuzzy weighted approach

Consider a dataset comprised of fuzzy relational Database D = { t1, t2, t3…. ti…tn} with a set of attributes A = (a1, a2, ……am}; each aK is related with a linguistic labels set L = {l1, l2, ……lL} for example L = {high, low, moderate}. Consider that each ak is associated with fuzzy set Fk = {(ak,l1), (ak,l2), (ak,l3), ……(ak,lL)}. In the given record rk, each attribute ai is associated with some degree of fuzzy sets. A membership degree in the range [0.0.1] is produced by some degree of association. Consider any fuzzy attribute ai of fuzzy set lj in record rk; the degree of membership will be denoted as rk[µ(Ii, lj)] of dataset D1. Here to generate association rules and strong rules between attributes following definitions and formulas are offered.

Definition 1

Weight of Fuzzy Attribute: Table 3 exhibits the automated weight computed for fuzzy attributes of the breast cancer dataset [14]. This approach is used to give weight W(Ii, lj) to each fuzzy Item I (Ii, lj) where (1 ≤ i ≤ n), (1 ≤ j ≤ L), and (0 ≤ w ≤ 1).

Definition 2

Weight of Fuzzy Attribute Set Record: rk[FASRW(X)] is calculated as the product of the weight of the fuzzy attribute of the set and membership degree of an attribute in a given fuzzy set in the transaction rk as formulated below in Eq. 3.

$$\mathrm{rk }[\mathrm{FASRW}(\mathrm{X})]=\prod ( \forall (\mathrm{Ii},\mathrm{Ij})\epsilon \mathrm{ X}) [\mathrm{ rk }\left[\upmu \left(\mathrm{Ii},\mathrm{Ij}\right)*\mathrm{ W}\left(\mathrm{Ii},\mathrm{Ij}\right)\right].$$
(3)

Definition 3

Weight of Fuzzy Attribute_Set: FA_SW(X) is calculated as the sum of FASRW of all clinical records, and the formula is framed as follows Eqs. 4 and 5.

$${\mathrm{FA}}_{\mathrm{SW}\left(\mathrm{X}\right)}=\sum_{k=1}^{D1}rk \left[FASRW\left(X\right)\right],$$
(4)
$$FA_SW(X) =\sum_{k=1}^{D1}\prod_{i=1}^{\mathrm{X}}(\forall (\mathrm{Ii},\mathrm{Ij}) \epsilon \mathrm{ X}) [\mathrm{rk }[\upmu (\mathrm{Ii},\mathrm{Ij}) *\mathrm{ W}(\mathrm{Ii},\mathrm{Ij})].$$
(5)

Definition 4

Support with Fuzzy_Weighted Concept: In this concept, a generalized formula is framed for Fuzzy weighted support of two attributes, Multi attributes and class label.

SupportOfFuzzy _Weight of rule X → Y, where X and Y are set of non-empty subsets of fuzzy weighted attributes is calculated as the sum of weights of all records in which the given Y is true, divided by the total number of records, denoted by SupportOfFuzzy_Weight (X → Y) provided by Eq. 6.

$${\text{Support of Fuzzy}}\_{\text{Weight}}({\text{X}} \to {\text{Y}}) = \frac{{\sum {\forall r_{k} \ {\text{having}}\ r_{k} [{\text{FASRW}}(X)]} \ {\text{given}}Y}}{{{\text{No. of records} \, \text{in}}\ D1}},$$
(6)

where rk is all transactions for which the given class_label is true.

Definition 5.

Confidence with Fuzzy_Weight Concept: In this concept, a generalized formula is framed for Fuzzy weighted Confidence of two attributes, Fuzzy weighted Confidence of Multi attributes and Fuzzy weighted.

Confidence in the given class label. Confidence Fuzzy_ Weight of a rule X → Y where X is non-empty set of attribute and Y is also an attribute. And it is defined as the ratio of SupportOf Fuzzy_Weight of (X ∪ Y) and SupportOfFuzzy _Weight of (X) as mentioned in Eq. 7.

$${\text{Confidence Of Fuzzy}}_{\mathrm{Weight}}=\frac{\text{Support Of Fuzzy}\_\mathrm{Weight}(\mathrm{X }\cup \mathrm{Y})}{\mathrm{Support Of Fuzzy}\_\mathrm{Weight}(\mathrm{X})}.$$
(7)

A new concept known as fuzzy_weighted_bayes_confidence is proposed to construct a fuzzy_weighted Bayesian belief network, i.e. FWBBN.

Definition 6.

To define FuzzyWeighted _BayesianConfidence (FW_BC) consider a rule X → Y which is framed as P (Y|X) as in Eq. 6 and used to assess BN as given below in Eq. 8.

$${\mathrm{FW}}_{\mathrm{BC}}\left(\mathrm{X}\to \mathrm{Y}\right)=\mathrm{P }\left(\mathrm{Y}|\mathrm{X}\right)= \frac{{\text{Support Of Fuzzy}}_{\mathrm{Weight}\left(\mathrm{X},\mathrm{Y}\right)}}{{\text{Support Of Fuzzy}}_{\mathrm{Weight}\left(\mathrm{X}\right)}}.$$
(8)

Applying the above algorithm and formulas to various clinical datasets to achieve desired and outshone results.

4 Result and discussion

The model is developed using the proposed methodology and designed formulas in which the dataset's attributes are manipulated using a fuzzy weighted approach related to the generation of strong rules to build the Bayesian networks for the medical domain, which will be an efficient model for higher accuracy. Table 4. reveals the experimental value setup, generation of rules, and extraction of solid rules based on Fuzzy Weighted Bayesian Confidence (FWBC) using a minimum threshold value of fuzzy weighted support and fuzzy weighted confidence to eradicate overfitting and underfitting problem [34]. FWAR mining is applied to generate strong rules to design a Bayesian model termed FWBBN with an efficient and more accurate predictive model in the form of a clinical decision support system.

Table 4 Strong rules based on FWBC and its accuracy

The experiment shows that the model developed using training data = 70% and test data = 30% with strong rules based on fuzzy weighted Bayes confidence gives the accuracy of 99% for the breast cancer dataset particularly.

5 Comparative analysis

This model is enforced to numerous clinical datasets from the UCI repository for rigorous comparative analysis. The LUCS KDD DATASETS in.num format are downloaded of Heart disease, Pima Indian diabetic, Hepatitis and liver disorder datasets [31]. The results are excellent as FWBBN perform with noteworthy accuracy, proving that the proposed model FWBBN executes efficiently with diverse clinical datasets, as shown in Table 5. This analysis reveals the highest accuracy by setting different minimum threshold values for fuzzy weighted support and fuzzy weighted confidence with varying training and testing datasets ratios. Thus, the proposed model outshone its performance in varieties of the clinical dataset, proving that Bayesian Networks is best suited to work in the clinical world.

Table 5 FWBBN Results on other Clinical Datasets

The put forward model FWBBN is analyzed with existing fuzzy classification models using various medical datasets in the clinical world. Table 6. manifest the comparisons of the proposed model with other already available state-of-the-art systems like Fine Tuning Fuzzy KNN classifier [35], Spare Bayesian Randon Weight Fuzzy Neural Network (RWFNN) [36], Fuzzy Decision Tree (FDT) [37], Fuzzy Random Forest (FRF)Technique [38], Neuro-Fuzzy Classifier [39], Fuzzy Temporal rule-based classification model [40].

Table 6 Comparison of FWBNN with existing fuzzy-based classification models

Through rigorous comparisons of the proposed model with existing fuzzy models, it seems FWBBN outperforms when compared with some models and is at par for some. And the experimental results confirmed that the FWBBN is more bonafide and justifiable than other existing models and can be used for various disease diagnoses and refinements.

6 Conclusions and future scope

A new methodology and algorithm for improving WBAR are proposed and termed FWBAR, an efficient algorithm for constructing CDSS using BBN as FWBBN. This proposed algorithm with new formulas and concepts is implemented using the UCI machine learning repository, especially with the breast cancer data, Heart disease data, and many more benchmark datasets to be worked with. The fuzzy approach is applied to reduce the sharp boundary problem in WBAR. Thus, stronger rules will be yielded to datasets using a weighted and fuzzy method. For prediction, FWBBN-CDSS can be utilized very effectively and accurately in terms of high performance, minor error, and low time complexity compared to the conventional Bayesian model. In future work, fuzzy weighted Bayesian rules can be used to generate synthetic datasets most demanded in the clinical world for research and deep analysis, which will be validated using the FWBBN model.