Fuzzy Dempster-Shafer Modelling and Decision Rules

Sosnowski, Zenon A.; Walijewski, Jarosław S.

doi:10.1007/978-3-319-45378-1_46

Zenon A. Sosnowski¹⁸ &
Jarosław S. Walijewski¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9842))

Included in the following conference series:

IFIP International Conference on Computer Information Systems and Industrial Management

2251 Accesses

Abstract

In this study, we discuss the use of Dempster-Shafer theory as a well-rounded algorithmic vehicle in the construction of fuzzy decision rules. The concept of fuzzy granulation realized via fuzzy clustering is aimed at the discretization of continuous attributes. Detailed experimental studies are presented concerning well-known medical data sets available on the Web.

You have full access to this open access chapter, Download conference paper PDF

A case study for medical decision making with the fuzzy soft sets

Article 13 November 2019

Diagnostic Rule Extraction Using the Dempster-Shafer Theory Extended for Fuzzy Focal Elements

Generating Fuzzy Partitions from Nominal and Numerical Attributes with Imprecise Values

Keywords

1 Introduction

Fuzzy modeling is regarded to be one of the possible classification architecture of machine learning and data mining. There have been a significant number of studies devoted to generating fuzzy decision rules from sample cases or examples. These include attempts to extend many classical machine learning methods to learn fuzzy rules. One very popular approach is decision trees [10]. Since the inception of this concept, it has been extended for the construction and interpretation of more advanced decision trees [3, 5–7, 9, 13, 15, 18]. Although the decision trees based methods can extract a set of fuzzy rules which works well, a problem is that the lack of backtracking in splitting the node leads to lower learning accuracy comparing to other machine learning methods. Another widely used machine learning method is artificial neural network. In recent years enormous work has been done in attempt to combine the advantages of neural network and fuzzy sets [14]. Hayashi [4] has proposed to extract fuzzy rules from trained neural net. Lin [8], on the other hand, introduced a method of directly generating fuzzy rules from self-organized neural network. The common weakness of neural network, however, is a problem of determination of the optimal size of a network configuration, as this has a significant impact on the effectiveness of its performance.

The objective of this paper is to employ the Dempster-Shafer theory (DST) as a vehicle supporting the generation of fuzzy decision rules. More specifically, we concentrate on the role of fuzzy operators, and on the problem of discretization of continuous attributes. We show how they can be effectively used in the quantization of attributes for the generation of fuzzy rules.

The material is arranged in the following way. First, we summarize the underlying concepts of the Dempster-Shafer theory and briefly discuss the nature of the underlying construction. By doing so, the intension is to make the paper self-contained and help identify some outstanding design problems emerging therein. In Sect. 4 we explain essentials of our model. Finally, in Sect. 5, we report exhaustive experimental studies.

This paper is a continuation of our earlier work [12]. Here we apply theoretical vehicle, introduced in previous research, to the new input data in order to find possible area of application. Our important objective here is to reveal a way in which this approach becomes essential to a more comprehensive treatment of continuous attributes.

2 Dempster-Shafer Theory

The Dempster-Shafer theory starts by assuming a Universe of Discourse Θ also called Frame of Discernment, which is a finite set of mutually exclusive alternatives. The frame of discernment may consist of the possible values of an attribute. For example, if we are trying to determine the disease of a patient, we may consider Θ being the set consisting of all possible diseases.

For each subset S of Θ it is associated:

a basic probability assignment m(S)
a belief Bel(S)
a plausible belief Pla(S)

m(S), Bel(S) and Pla(S) have value in the interval [0,1], and Bel(S) is not greater than Pla(S).

In particular, m represents the strength of some evidence. For example, in rule-based expert system, m may represent the effect of applying of a rule. Bel(S) summarizes all our reasons to believe S. Pla(S) expresses how much we should believe in S if all currently unknown facts were to support S. Thus the true belief in S will be somewhere in the interval [Bel(S), Pla(S)]. More formally, a map

$$ m: 2^{\Theta } \to \left[ {0, 1} \right] $$

(1)

such that for each $ {\text{A}} \in 2^{\Theta } $ (where $ 2^{\Theta } $ is set of all subsets of $ \Theta $)

1.
$ m(\emptyset ) = 0 $
2.
$ \sum\limits_{{A \subseteq\Theta }} {m(A) = 1} $

is called a basic probability assignment for $ \Theta $.

Subset A is called a focal element of m if m(A) > 0.

For a given basic probability assignment m, the Belief of a subset A of $ \Theta $ is the sum of m(B) for all subsets B of A, so

$$ Bel: 2^{\Theta } \to \left[ {0, 1} \right] $$

(2)

such that $ Bel(A) = \sum\limits_{B \subseteq A} {m(B)} $.

The Plausibility of a subset A of $ \Theta $ is defined as Pla(A) = 1 − Bel(A’), where A’ is the complement of A in $ \Theta $.

If we are given two basic probability assignments m₁ and m₂, we can combine them into a third basic probability assignment $ m: 2^{\Theta } \to \left[ {0, 1} \right] $ in the following way.

Let us consider frame of discernment $ \Theta $ and two belief functions Bel₁ and Bel₂. We denote focal elements of Bel₁ as A₁…A_K and focal elements of Bel₂ as B₁…B_L respectively and the basic probability assignments as m₁ and m₂. Then we can show graphically this combination as an orthogonal sum of m₁ and m_2.

The mass of probability of the interval A_i ∩ B_j expressed as a measure m₁(A_i)·m₂(B_j) is illustrated in Fig. 1.

Of course, more intersections can give the same focal element A. In general the mass of probability of set A is defined as:

$$ m(A) = \sum\limits_{{\begin{array}{*{20}c} {i,j} \\ {A_{i} \cap B_{j} = A} \\ \end{array} }} {m_{1} (A_{i} )} \cdot m_{2} (B_{j} ) $$

(3)

Then we find a problem with an empty set. It is possible that sets with empty intersection exist. We can meet this normal situation in many combinations. Then the mass of ∅, according to above definition, will be greater than zero, but according to the definition of basic probability assignment, it is not possible.

We assume that

$$ \sum\limits_{{\begin{array}{*{20}c} {i,j} \\ {A_{i} \cap B_{j} = \emptyset } \\ \end{array} }} {m_{1} (A_{i} )} \cdot m_{2} (B_{j} ) < 1 $$

(4)

to define the orthogonal sum m₁ and m₂, and denote it as $ m_{1} \oplus m_{2} $.

Then it is necessary to change the definition (3) of the formula of the basic probability assignment of the combination as follows:

1.
$ m(\emptyset ) = 0 $
2.
$ m(A) = \frac{{\sum\limits_{{\begin{array}{*{20}c} {i,j} \\ {A_{i} \cap B_{j} = A} \\ \end{array} }} {m_{1} (A_{i} ) \cdot m_{2} (B_{j} )} }}{{1 - \sum\limits_{{\begin{array}{*{20}c} {i,j} \\ {A_{i} \cap B_{j} = \emptyset } \\ \end{array} }} {m_{1} (A_{i} ) \cdot m_{2} (B_{j} )} }} \;{\text{for}}\,{\text{non-empty}}\;{\text{A}} \subset \Theta $

We call this as an orthogonal sum of Bel₁ and Bel_2, and denote as Bel₁ ⊕ Bel₂.

This is the Dempster Rule for Combining of Beliefs [11].

For $ \Theta $ we have $ m\left(\Theta \right) = \sum\limits_{{A \subseteq\Theta }} {m(A) = 1} $ and for combination if

$$ \sum\limits_{{\begin{array}{*{20}c} {i,j} \\ {A_{i} \cap B_{j} = \emptyset } \\ \end{array} }} {m_{1} (A_{i} )} \cdot m_{2} (B_{j} ) = 0 $$

we have

$$ m(\Theta ) = \sum\limits_{{A \subseteq\Theta }} {m(A)} = \sum\limits_{{A \subseteq\Theta }} {\sum\limits_{{\begin{array}{*{20}c} {i,j} \\ {A_{i} \cap B_{j} = A} \\ \end{array} }} {m_{1} (A_{i} )} \cdot m_{2} (B_{j} )} = \sum\limits_{i,j} {m_{1} (A_{i} )} \cdot m_{2} (B_{j} ) = 1 $$

3 Fuzzy Modelling

Fuzzy set theory is widely known and we do not introduce its underlying concepts essential to understand this framework. Readers interest themselves we refer to [16] and [17].

Fuzzy Modeling is applied in those areas where the model of the system cannot be described precisely because of many reasons. The input data received by the system may not be completely reliable, may contain noise, or may be inconsistent with other data or with expectation about these data. The system is described by the set of linguistic rules. Let D denotes an output variable of the system, and X₁, X₂, …, X_n denote an input variables. The linguistic rules have the following format:

$$ If\textit{(}X_{1} \;is\;A_{k,1,j1}\textit{)}\;And\; \ldots And\;\textit{(}X_{n} \;is\;A_{k,n,jn}\textit{)}\;Then\;\textit{(}D\;is\;S_{k,p}\textit{)} $$

(5)

where (X_i is A_k,i,ji) are the fuzzy antecedents, A_k,i,ji (1 ≤ j_i ≤ |A_i|) are values of the i-th input variable, and S_k,p (1 ≤ p ≤ |S|) is the value of output variable in k-th rule.

The rules are implemented as fuzzy relation according to the formula:

$$ R_{k} = A_{k,1,jl} \times A_{k,2,jl} \times \cdots \times A_{k,n,jl} \times S_{k,p} $$

(6)

where × denotes the fuzzy Cartesian product.

Then all rules are aggregated to relation R described as:

$$ R = \bigcup\limits_{k = 1}^{M} {R_{k} } $$

(7)

where M is the number of rules.

The conclusion is based on the compositional rule of inference

$$ S' = \left( {A_{1,j1}^{'} \times A_{2,j2}^{'} \times \; \cdots \; \times A_{n,jn}^{'} } \right) \circ R $$

(8)

where $ A_{1,j1}^{'} ,A_{2,j2}^{'} , \cdots ,A_{n,jn}^{'} $ are input values, $ S' $ is a conclusion (decision class), and $ \circ $ denotes the composition of fuzzy relation.

In fuzzy modeling we can assume that expert defines the rule set, or we can automatically generate them from the set of samples describing the behavior of the system being modeled.

4 Fuzzy Dempster-Shafer Model

In Fuzzy Dempster-Shafer (FDS) model [2] we consider rules R_r as:

$$ If\;\left( {X_{1} is\;A_{{r,1,j_{1} }} } \right)\; \ldots \;And\; \ldots \;\;\left( {X_{n} is\;A_{{r,n,j_{n} }} } \right)\quad Then\quad \left( {D\;is\;m_{r} } \right) $$

(9)

where X_i and D stand for input and output respectively, and m_r is a fuzzy belief structure, that is a standard belief structure with focal elements S_r,p as fuzzy subset of frame of discernment Θ with basic probability assignment m_r (S_r,p), and m_r (S_r,p) is the believe that the conclusion should be represented as class S_r,p.

4.1 Learning – Rules Construction

In antecedent construction, let us assume that we have n features (attributes) in antecedents of testing example. We consider a collection of m generic linguistic terms characterized by membership functions defined in a universe of discourse being a domain of each attribute. The conclusion belongs to decision class S.

For each element of data t we build a collection:

$$ \begin{array}{*{20}l} {A_{1,1,t} } \hfill & {A_{2,1,t} } \hfill & \ldots \hfill & {A_{n,1,t} } \hfill \\ {A_{1,2,t} } \hfill & {A_{2,2,t} } \hfill & \ldots \hfill & {A_{n,2,t} } \hfill \\ \vdots \hfill & \vdots \hfill & \ddots \hfill & \vdots \hfill \\ {A_{1,m,t} } \hfill & {A_{2,m,t} } \hfill & \ldots \hfill & {A_{n,m,t} } \hfill \\ \end{array} $$

(10)

where A_i,j,t are the values of j-th membership function for i-th feature and for t-th element of data.

Example 1.

We demonstrate the calculations on the set of synthetic data presented in Table 1.

Table 1. Sample data set

Full size table

First six rows (L1–L6) will constitute learning data, while the remaining ones (T1–T4) will form testing data. All the features are numbers from the <0; 9> interval. The last column represents the decision class equal to 1 or 2. We will consider four membership quadratic functions uniformly distributed along the space of all attributes. Other membership functions will be discussed in the next section.

According to (10) for row T1 we have:

$$ \begin{array}{*{20}l} { 1.0} \hfill & { 1.0} \hfill & {0.0 6 2 5} \hfill & {0.0 1 5 6} \hfill \\ 0 \hfill & 0 \hfill & {0. 9 3 7 5} \hfill & {0. 9 8 4 4} \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill \\ \end{array} $$

■ On the base of (10), for a given data point t we can calculate two vectors:

$$ \begin{array}{*{20}c} {A_{\mu ,t} :} & {A_{{1,\max_{1} ,t}} } & {A_{{2,\max_{2} ,t}} } & \ldots & {A_{{n,\max_{n} ,t}} } \\ \end{array} $$

(11)

and index of membership functions

$$ \begin{array}{*{20}c} {I_{c,t} :} & {I_{{1,\max_{1} ,t}} } & {I_{{2,\max_{2} ,t}} } & \ldots & {I_{{n,\max_{n} ,t}} } \\ \end{array} $$

(12)

where $ A_{{i,\max_{i} ,t}} $ is a maximum value of all membership functions designed for the feature i and $ I_{{i,\max_{i} ,t}} $ is the number of the best membership function for feature i.

Then we have the following candidate for a rule

$$ \begin{array}{*{20}c} {R_{t} :} & {I_{{1,\max_{1} ,t}} } & {I_{{2,\max_{2} ,t}} } & \ldots & {I_{{n,\max_{2} ,t}} } \\ \end{array} $$

(13)

The firing level of the rule is calculated according to the following formula

$$ \tau_{t} = \mathop \phi \limits_{i = 1}^{n} \left( {A_{{i,\max_{i} ,t}} } \right) $$

(14)

where $ \upphi $ means the operator of fuzzy matching. See Sect. 5.2. for details.

The rule candidate is added to rules set if $ \phi \left[ {\tau_{r} ,m_{r} } \right] \ge Th $ (where Th threshold value, and $ \phi $ matching operator). This can help to eliminate bad rules from the final rule set.

More ten one rule can have the same antecedent part but it is also possible that conclusion of these rules are different. Then we have to use appropriate counters c_t,1,.., c_t,|S|, where |S| denotes the power of decision class set. These counters can show us how many data, according to rule pattern, vote for each decision class.

Example 2.

In our sample (T1) the vectors are:

A_μ,1: 1.0000, 1.0000, 0.9375, 0.9844
I_c,1: 1, 1, 2, 2 with counters vector 1, 0

In our sample matching value equals to 0.9229, were multiplication was used as the matching operator (1.0000* 1.0000* 0.9375* 0.9844 = 0.9229). For the threshold set on 0.75, we obtain a new rule.

■

The product is a new belief structure on X

$$ \hat{m}_{r} = \tau_{r} \wedge m_{r} $$

(15)

Focal elements are fuzzy subset given as

$$ F_{r,p} (x) = \tau_{r} \wedge S_{r,p} (x) $$

(16)

and appropriate distributions of new focal elements are defined as:

$$ \hat{m}_{r} (F_{r,p} ) = m_{r} (S_{r,p} ) $$

(17)

So we can build an aggregate:

$$ m = \bigcup\limits_{r = 1}^{R} {\hat{m}_{r} } $$

(18)

Then for each collection

$$ \Im = \left\{ {F_{{r_{1} ,p_{1} }} ,F_{{r_{2} ,p_{2} }} , \ldots ,\quad F_{{r_{R} ,p_{R} }} } \right\} $$

(19)

where $ F_{{r_{t} ,p_{t} }} $ are focal elements of $ \hat{m}_{r} $ we have focal element E of m described as

$$ E = \bigcup\limits_{t = 1}^{R} {F_{{r_{t} ,p_{t} }} } $$

(20)

with appropriate probability distribution

$$ m\left( E \right) = \prod\limits_{t = 1}^{R} {m\left( {F_{{r_{t} ,p_{t} }} } \right)} $$

(21)

At this point, the rule generation process is complete.

Example 3.

Our sample data produce the following rule set.

$$ \begin{array}{*{20}l} {} \hfill & {{\text{I}}_{{ 1 {\text{max}}}} } \hfill & {{\text{I}}_{{ 2 {\text{max}}}} } \hfill & {{\text{I}}_{{ 3 {\text{max}}}} } \hfill & {{\text{I}}_{{ 4 {\text{max}}}} } \hfill & {{\text{C}}_{ 1} } \hfill & {{\text{C}}_{ 2} } \hfill & m \hfill \\ {{\text{R1}}:} \hfill & 1\hfill & 1 \hfill & 2 \hfill & 2 \hfill & 1 \hfill & 0 \hfill & { 2. 5000} \hfill \\ {{\text{R2}}:} \hfill & 1\hfill & 1 \hfill & 4 \hfill & 1 \hfill & 2 \hfill & 1 \hfill & { 2.0 8 3 3} \hfill \\ {{\text{R3}}:} \hfill & 4\hfill & 4 \hfill & 4 \hfill & 4 \hfill & 0 \hfill & 1 \hfill & { 1. 2 500} \hfill \\ {{\text{R4}}:} \hfill & 2 \hfill & 3\hfill & 3 \hfill & 4 \hfill & 0 \hfill & 1 \hfill & { 1. 2 500} \hfill \\ \end{array} $$

The first four elements are numbers of the best membership function for proper features, the next two are counters for decision classes and the last one is a probability distribution.

Let us observe that rule R2 covers the data L1, L4 and L5. L1 and L4 produce decision class C₁ but L5 decision class C₂.

■

Now we can move to the testing of new rules.

4.2 Test

In testing we ignore the value from the last column in Table 1, that is decision class number, because our goal is to calculate it.

To compute the firing level of a rule k for a given data

$$ \begin{array}{*{20}c} {X_{k} :} & {X_{1,k} } & {X_{2,k} } & \ldots & {X_{n,k} } & {D_{k} } \\ \end{array} $$

(22)

where X_i,k – feature’s value, D_k – conclusion decision class that we have to compare with the result of inference; we build a rule matrix

$$ \mu_{k,t} = \mathop\Phi \limits_{i = 1}^{n} \left( {A_{i,l,k} \left( {X_{i,t} } \right)} \right),\quad l = I_{i,\hbox{max} ,k} $$

(23)

We are interested only in active rules i.e. rows with matching value $ \upmu_{k,t} > 0 $.

Example 4.

In test we will demonstrate calculations on

$$ \begin{array}{*{20}l} {{\text{L}}5.\;2} \hfill & 2 \hfill & 2 \hfill & 2 \hfill \\ {{\text{T}}1.\;3} \hfill & 3 \hfill & 3 \hfill & 2 \hfill \\ \end{array} $$

For sample data L5 we have two active rules:

$$ \begin{array}{*{20}l} {{\text{R1:}} 1\; 1\; 2\; 2\quad } \hfill & {\quad 1\;0\quad } \hfill & {0. 8 5 9 3 7 5\;0. 8 5 9 3 7 5\;0. 60 9 3 7 5\;0. 60 9 3 7 5\;0. 2 7 4 2 4 2} \hfill \\ {{\text{R2:}} 1\; 1\; 1\; 1\quad } \hfill & {\quad 2\;1\quad } \hfill & {0. 8 5 9 3 7 5\;0. 8 5 9 3 7 5\;0. 8 5 9 3 7 5\;0. 8 5 9 3 7 5\;0.54542} \hfill \\ \end{array} $$

The first four elements are the rule pattern, the next two are the counters for decision classes. The next four numbers are the values of appropriate membership function. The number 0.859375 is the value of the first membership function, according to the first number in the rule, on the first feature. The next three numbers are calculated in the same way.

The last numbers in the above rows are the matching value for the rule. It has been calculated by matching operator for the values of membership function.

We focused only on the rows with matching value grater then zero.

For sample data T1 we have:

$$ \begin{array}{*{20}l} {{\text{R1:}}\; 1\; 1\; 2\; 2\quad } \hfill & {\quad 1\;0\quad } \hfill & {0. 4 3 7 5\;0. 4 3 7 5\;0. 9 3 7 5\;0. 60 9 4\;0. 10 9 3} \hfill \\ {{\text{R2:}}\; 1\; 1\; 1\; 1\quad } \hfill & {\quad 2\;0\quad } \hfill & {0. 4 3 7 5\;0. 4 3 7 5 \;0. 4 3 7 5\;0. 8 5 9 4\;0.0 7 20} \hfill \\ \end{array} $$

■

For each collection of $ F_{{r_{t} ,p_{t} }} $ focal elements $ \hat{m}_{r} $ we define an aggregate

$$ E = \mathop \cup \limits_{t = 1}^{R} F_{{r_{t} ,p_{t} }} $$

(24)

with basic probability assignment

$$ m(E) = \prod\limits_{t = 1}^{R} {m\left( {F_{{r_{t} ,p_{t} }} } \right)} $$

(25)

The results of classification are D is m, with focal elements E_k(k = 1,…,R^|S|) and distribution m(E_k). That results are calculated using focal elements and appropriate counters c_t,1,.., c_t,|S|.

Example 5.

For sample point L5 and T1 the counters are 3, 1, and 3, 0 respectively

■

Then we perform defuzzification according to COA method [1].

$$ \bar{y} = \sum\limits_{k = 1}^{{R^{|S|} }} {\bar{y}_{k} m(E_{k} )} $$

(26)

where $\bar{y}$_k are defuzzified values for focal element E_k defined as

$$ \bar{y}_{k} = \frac{{\sum\limits_{1 \le t \le n} {x_{t} \mu_{k,t} (x_{t} )} }}{{\sum\limits_{1 \le t \le n} {\mu_{k,t} (x_{t} )} }} $$

(27)

In the next step, the rules structure is simplified to

$$ If\,Antecedent_{r} \,Then\,\textit{(} {D\,is\,H_{r} } \textit{)}, $$

where $ H_{r} = \left\{ {\tfrac{1}{{\gamma_{r} }}} \right\} $ is a singleton fuzzy set for factor $ \gamma_{r} = \sum\limits_{p = 1}^{|S|} {\bar{y}_{p} m_{r} (S_{r,p} )} $.

Example 6.

For both L5 and T1 we calculate decision class 1. It is correct in case of T1, but wrong for L5. The values of H_r are 0.4283 and 0.4800 respectively. ■

5 Empirical Learning for FDS Model

In this section we compare and analyze the performance of several membership functions and matching operators. We start from a standard solution used in the introduction to fuzzy modeling, then we consider more complicated models. We compute results for the following membership function: Linear, Quadratic, Gaussian, and FCM. We concentrate on Minimum, Multiply and Implication as a matching operators. The most valuable is comparing the results of all calculations. In the end of this section we show some results of experimental research.

5.1 Membership Functions

The membership function makes possible the division of data into n intervals. It is a way of discretization of the input data. Hence, we get the best result for continuous data or for data with several discrete (nominal) values. If we have discrete or binary data then results of the proposed model are not good enough.

The choice of membership function has a great influence on the quality of rules. Although the quantity of rules is different, the quality of classification is comparable.

The most interesting membership function was generated by Fuzzy c-Means (FCM) algorithm [1]. The results of experimental research with membership functions have been summarized in Table 2.

Table 2. Membership functions

Full size table

5.2 Matching Operators

It was shown in our experiments that a matching operator applied to data sample with existing rules plays the very important role in the accuracy of diagnoses. It occurred as early as the rules were generated. A matching operator influences the quality of the generated rules. Of course, this quality has secondary means, but in general, the more rules the better accuracy.

From the analysis of the results of experiments in Table 4, we can infer that the most powerful operator is implication. This is not all the true, because Table 4 shows the results only for one fixed threshold value. It is not optimal in all instances, especially in multiply operator. The change of the threshold value (e.g. to 0.25) gives almost the same results as for implication operator. Anyway, the choice of the threshold value is of minor importance here, but it can have influence on the result of the receiving of rules. Of course, we cannot analyze the threshold value without keeping in mind the features of the membership function and the number of intervals. The choice of the threshold value will be subject of future works.

The results of the investigation of various matching operators are collected in Table 3.

Table 3. Matching operators

Full size table

6 Experimental Studies

Some results of experimental research are shown in Table 4. We fixed here count of membership functions on 6 and threshold value on 0.75. All data sets have been divided into two parts: learning (training) data (about 2/3 of the entire data set) and testing data (remaining 1/3). The learning data has been used to generate the rule set. Testing data has been applied to test the produced rule set. To obtain reliable results, we carried out the experiment several times.

Table 4. Experimental results

Full size table

That formulation does not show the most favorable case but it is allowed to see the part of real results with using different methods for generalizing rules. These results can be comparable.

The methods of automatically generated decision rules that are described in this paper have the best results on Iris data set. In a few points, reach a destination 100 % of the decision accuracy. This set consists of all data as continuous values. In other cases, the construction of the data discretization caused a little worse results. In spite of the fact that in the case of Ulcers data set, for which over the half features was discrete, the results on the learning data was nearly the same like in the case of Iris data.

Diabetes data set is a sample of testing proposed algorithms on discrete data. The results of rules accuracy are not satisfactory. It has shown the case when the method of generating and verifying decision rules ineffectual as only one.

Another observation is that, the smaller number of binary data in antecedent, then the better accuracy of our rules. If the features have no binary data or if number of it is strongly less, then others then our rules can be applicable. We can see that during analyzing Breast Cancer Wisconsin and Dermatology data sets. In Echocardiogram data set, when number of binary data is equal to three the results are worse.

In all the instances, intermediate reports were stored in disk data files. It was used to compute results with the FCM algorithm. It also made possible to connect described method with others.

In our research, the proper choice of threshold value gave us information if rule is valuable or not. The importance of this value has been shown in the case of Multiply operator. The result presented in Table 4 could suggest less “weight” of this operator, but it is not true. If we change threshold value, we notice that it has almost the same occurrence as more complied in calculations Implication operator. Implication is a sample of a very interesting and powerful operator of fuzzy relation.

We compare the results of our research with standard decision trees algorithm [10]. For all data sets, we get better results using Gaussian function or FCM algorithm [1], and in a few points of Quadratic function, we obtained also better accuracy.

In Dermatology and Echocardiogram data sets, we removed records with missing input features, because we concentrate only on complete data.

7 Conclusions

The study has focused on the use of Fuzzy Dempster-Shafer model for generating of fuzzy decision rules. Fuzzy sets are useful in discretization of continuous attributes. The approach is discussed in the concrete applications of two real medical data sets (especially to problems of identification of diseases) and several well-known data sets available on the Web. The results are used to classify objects. It has been found through series of experiments that this approach outperforms the results of the C4.5 algorithm in terms of higher classification accuracy.

References

Bezdek, J.C., Sabin, M.J., Tucker, W.T.: Convergence theory for fuzzy c-means: counterexamples and repairs. IEEE Trans. Syst. Man Cybern. smc-17(5), 873–877 (1987)
Article Google Scholar
Binaghi, E., Gallo, I., Madella, P.: A neural model for fuzzy Dempster-Shafer classifiers. Int. J. Approx. Reason. 25, 89–121 (2000)
Article MathSciNet Google Scholar
Chiang, I.J., Hsu, J.Y.J.: Integration of fuzzy classifiers with decision trees. In: Proceedings of 1996 Asian Fuzzy Systems Symposium, pp. 266–271 (1996)
Google Scholar
Hayashi, Y., Imura, A.: Fuzzy neural expert system with automated extraction of fuzzy if-then rules from a trained neural network. In: 1st International Symposium on Uncertainty Modeling and Analysis, pp. 489–494, December 1990
Google Scholar
Hayashi, A., Maeda, T., Bastian, A., Jain, L.C.: Generation of fuzzy decision trees by fuzzy ID3 with adjusting mechanism of AND/OR operators. In: Proceedings of 1988 International Conference on Fuzzy Systems (FUZZ-IEEE 1998), pp. 681–685 (1998)
Google Scholar
Ichihashi, H., Shirai, T., Nagasaka, K., Miyoshi, T.: Nero-fuzzy ID3: a method of inducing fuzzy decision trees with linear programming for maximizing entropy and an algebraic method for incremental learning. Fuzzy Sets Syst. 81(1), 157–167 (1996)
Article Google Scholar
Janikow, C.Z.: Fuzzy decision trees: issues and methods. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 28(1), 1–14 (1998)
Article Google Scholar
Lin, C.T., Lee, C.S.: Neural-network-based fuzzy logic control and decision system. IEEE Trans. Comput. 12, 1320–1336 (1991)
Article MathSciNet Google Scholar
Pedrycz, W., Sosnowski, Z.A.: C-fuzzy decision trees. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 35(4), 498–511 (2005)
Article Google Scholar
Quinlann, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Article Google Scholar
Shafer, G.: Mathematical Theory of Evidence. Princeton Univ. Press, Princeton (1976)
Book Google Scholar
Sosnowski, Z.A., Walijewski, J.S.: Generating fuzzy decision rules with the use of Dempster-Shafer theory. In: ESM 1999, pp. 419–426, Warsaw (1999)
Google Scholar
Umano, M., Okamoto, H., Hatono, I., Tamura, H., Kawachi, F., Umedzu, S., Kinoshita, J.: Generation of fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis by gas in oil. In: Proceedings of 1994 Japan-USA Symposium on Flexible Automation, pp. 1445–1448 (1994)
Google Scholar
Viharos, Z.J., Kis, K.B.: Survey on neuro fuzzy systems and their applications in technical diagnostics. Measurement 67, 126–136 (2015)
Article Google Scholar
Weber, R.: Fuzzy ID3: a class of methods for automatic knowledge acquisition. In: Proceedings of 2nd International Conference on Fuzzy Logic and Neural Networks, Iizuka, Japan, pp. 265–268 (1992)
Google Scholar
Yager, R., Filev, D.: Essentials of Fuzzy Modeling Control. Wiley, New York (1994)
Google Scholar
Yager, R., Filev, D.: Including probabilistic uncertainty in fuzzy logic controller modeling using Dempster-Shafer theory. IEEE Trans. Syst. Man Cybern. 25(8), 1221–1230 (1994)
Article Google Scholar
Yun, J., won Seo, J., Yoon, T.: The new approach on fuzzy decision trees. Int. J. Fuzzy Logic Syst. 4(3) (2014)
Google Scholar

Download references

Acknowledgement

This work was supported by the grant S/WI/1/2013 from Bialystok University of Technology founded by Ministry of Science and Higher Education.

Author information

Authors and Affiliations

Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, 15-351, Bialystok, Poland
Zenon A. Sosnowski & Jarosław S. Walijewski

Authors

Zenon A. Sosnowski
View author publications
You can also search for this author in PubMed Google Scholar
Jarosław S. Walijewski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zenon A. Sosnowski .

Editor information

Editors and Affiliations

Bialystok University of Technology, Bialystok, Poland
Khalid Saeed
University of Bialystok, Vilnius, Lithuania
Władysław Homenda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sosnowski, Z.A., Walijewski, J.S. (2016). Fuzzy Dempster-Shafer Modelling and Decision Rules. In: Saeed, K., Homenda, W. (eds) Computer Information Systems and Industrial Management. CISIM 2016. Lecture Notes in Computer Science(), vol 9842. Springer, Cham. https://doi.org/10.1007/978-3-319-45378-1_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-45378-1_46
Published: 09 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45377-4
Online ISBN: 978-3-319-45378-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Fuzzy Dempster-Shafer Modelling and Decision Rules

Abstract

Similar content being viewed by others

A case study for medical decision making with the fuzzy soft sets

Diagnostic Rule Extraction Using the Dempster-Shafer Theory Extended for Fuzzy Focal Elements

Generating Fuzzy Partitions from Nominal and Numerical Attributes with Imprecise Values

Keywords

1 Introduction

2 Dempster-Shafer Theory

3 Fuzzy Modelling

4 Fuzzy Dempster-Shafer Model

4.1 Learning – Rules Construction

Example 1.

Example 2.

Example 3.

4.2 Test

Example 4.

Example 5.

Example 6.

5 Empirical Learning for FDS Model

5.1 Membership Functions

5.2 Matching Operators

6 Experimental Studies

7 Conclusions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Fuzzy Dempster-Shafer Modelling and Decision Rules

Abstract

Similar content being viewed by others

A case study for medical decision making with the fuzzy soft sets

Diagnostic Rule Extraction Using the Dempster-Shafer Theory Extended for Fuzzy Focal Elements

Generating Fuzzy Partitions from Nominal and Numerical Attributes with Imprecise Values

Keywords

1 Introduction

2 Dempster-Shafer Theory

3 Fuzzy Modelling

4 Fuzzy Dempster-Shafer Model

4.1 Learning – Rules Construction

Example 1.

Example 2.

Example 3.

4.2 Test

Example 4.

Example 5.

Example 6.

5 Empirical Learning for FDS Model

5.1 Membership Functions

5.2 Matching Operators

6 Experimental Studies

7 Conclusions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation