EEFR-R: extracting effective fuzzy rules for regression problems, through the cooperation of association rule mining concepts and evolutionary algorithms

Aghaeipoor, Fatemeh; Eftekhari, Mahdi

doi:10.1007/s00500-018-03726-1

EEFR-R: extracting effective fuzzy rules for regression problems, through the cooperation of association rule mining concepts and evolutionary algorithms

Methodologies and Application
Published: 05 January 2019

Volume 23, pages 11737–11757, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

EEFR-R: extracting effective fuzzy rules for regression problems, through the cooperation of association rule mining concepts and evolutionary algorithms

Download PDF

Fatemeh Aghaeipoor² &
Mahdi Eftekhari¹

270 Accesses
8 Citations
Explore all metrics

Abstract

Fuzzy rule-based systems, due to their simplicity and comprehensibility, are widely used to solve regression problems. Fuzzy rules can be generated by learning from data examples. However, this strategy may result in high numbers of rules that most of them are redundant and/or weak, and they affect the systems’ interpretability. Hence, in this paper, a new rule learning method, EEFR-R, is proposed to extract the effective fuzzy rules from regression data samples. This method is formed through the cooperation of association rule mining concepts and evolutionary algorithms in the three stages. Indeed, the components of a Mamdani fuzzy rule-based system are generated during the first two stages, and then, they will be refined through some modifications in the last stage. In EEFR-R, fuzzy rules are extracted from numerical data using the idea of Wang and Mendel’s method and utilizing the concepts of Support and Confidence; furthermore, a new rule pruning method is presented to refine these rules. By employing this method, non-effective rules can be pruned in three different modes as the preferences of a decision maker. The proposed model and its stages were validated using 19 real-world regression datasets. The experimental results and the conducted statistical tests confirmed the effectiveness of EEFR-R in terms of complexity and accuracy and in comparison with the three state-of-the-art regression solutions.

Sifter: an approach for robust fuzzy rule set discovery

Article 23 May 2015

Classification Rule Mining Algorithm Combining Intuitionistic Fuzzy Rough Sets and Genetic Algorithm

Article 09 June 2020

A Novel Genetic Fuzzy System for Regression Problems

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In regression problems, a specified output is estimated using a set of input variables. There are a lot of methods which are employed to model regression problems, simple methods such as least squares to more advanced ones such as multi-objective evolutionary algorithms (Ratner 2017). Fuzzy inference systems (FISs) are one of the most useful methods to address regression problems. FISs, proposed based on the fuzzy set theory of Zadeh (1965, 1975), employ linguistic concepts to model input and output relationships. Since the inference ability of these systems relies on their rules, FISs are also called fuzzy rule-based systems (FRBSs). Each FRBS has a Knowledge Base (KB) which itself is comprised of two fundamental parts including Data Base (DB) and Rule Base (RB); the DB includes fuzzy set definitions and membership functions (MFs) parameters, and the RB contains a set of linguistic fuzzy If-Then rules (Riza et al. 2015). Fuzzy rules have a critical role in each FIS so that decision-making process is not possible without applying them.

There exist two types of approaches to derive fuzzy If-Then rules. In the first one, the RB is manually generated by the knowledge of human experts, while in the second one, called data-driven models, the rules are automatically extracted from numerical data by using the learning methods. The second approach is more practical in the situation of lacking human expert’s knowledge or in the complex systems where a complete knowledge of the problem is not available (Riza et al. 2015). In this regard, one of the classical rule learning methods is Wang and Mendel’s algorithm Wang and Mendel (1992), it obtains the rules set by learning from the training data. The simplicity and quickness of this approach have made it as a popular and widely used method, and for this reason, a lot of researchers have utilized its idea in their applications and have tried to improve it (Kato et al. 2009; Gacto et al. 2014).

In the automatic generation approaches, the number of obtained rules may become enormous, especially in the case of big datasets. Moreover, some of these generated rules may be redundant and non-effective, even they may be destructive in the cooperation with the other rules. On the other hand, the high number of rules results in losing interpretability of the fuzzy model (Alonso et al. 2015) such that the system behavior becomes hard to understand and the efficiency is highly affected. To handle these problems, there are two strategies available; the first one which relies on learning an appropriate number of effective rules from the beginning, and the second one where all possible rules are initially generated and in the second step, the non-effective rules will be pruned through an optimization process. Although the first approaches avoid exhaustive optimization tasks, the second group results in more efficient RB (Patel 2013). In this study, we attempt to combine the advantages of these two strategies. Moreover, most of the rule learning approaches perform with the same principles for different datasets with different conditions, e.g., the maximum and the minimum number of obtained rules are general and fixed parameters (Alcalá et al. 2011; Alcalá-Fdez et al. 2011a; Gacto et al. 2014). In this way, the circumstances of the problem and the preferences of the decision makers are ignored. This matter is also taken into account in the proposed rule learning method.

This paper presents EEFR-R, a new method to Extract Effective Fuzzy Rules for Regression problem. Indeed, an efficient Mamdani FRBS is constructed through the cooperation of association rule mining concepts and PSO algorithm in the three stages. Input variables are preprocessed using Fayyad and Irani’s discretization method, and then, all MFs are defined. In the following, the RB is constructed via a two-step process: rule generation and rule pruning. Rule generation is done using the idea of Wang and Mendel’s method, and rule pruning is proposed based on integrating a PSO algorithm into a common rule mining strategy. The new rule pruning method can eliminate additional rules in three levels (modes) based on the preferences of decision makers. Finally, in the post-processing stage, an optimization algorithm is utilized to tune the MFs and adjust the rules’ weight. EEFR-R is validated using 19 real-world regression datasets with different numbers of variables and samples. The performance of each stage is separately evaluated. Furthermore, the results of EEFR-R are compared with the three state-of-the-art regression solutions and some statistical tests are employed to carry out more clearly pairwise and multiple comparisons. Experimental results show the effectiveness of EEFR-R, especially in terms of complexity and accuracy.

The rest of this paper is organized as follows. Section 2 describes the preliminaries of the model. Section 3 details the three stages of EEFR-R, including preprocessing, model generation, and post-processing stage. Section 4 demonstrates the results of evaluations, comparisons, and statistical tests, and finally, Sect. 5 presents the conclusion of this study.

2 Preliminaries

The preliminaries of this study including discretization methods, association rule mining concepts, and PSO algorithm are described in this section. Before these descriptions, a review in the related literature is also done.

2.1 Literature review

There are different learning methods for the FRBS generation in the literature, e.g., learning based on space partitions, learning based on clustering approaches, learning based on gradient descent method, and learning by the means of neural networks or evolutionary algorithms (Riza et al. 2015). Learning based on space partitions, or structure-based approaches, partitions the data space using fuzzy sets and then extracts fuzzy rules based on those partitions (Liu and Cocea 2018). Cluster-based approaches do a clustering in data and then use each cluster to generate one rule (Prasad et al. 2014). The last three methods utilize the capabilities of gradient descent, neural networks, and evolutionary algorithms iteratively to learn the components of the FRBSs (Jang 1993; Fernandez et al. 2015).

One of the most successful learning algorithms for automatic generation of an FRBS is evolutionary fuzzy systems (EFSs), i.e., evolutionary algorithms have been integrated into fuzzy systems to learn or tune fuzzy elements (Fernandez et al. 2015). These hybrid systems due to their flexibility in the codification of the FRBSs, and given their ability in providing different trade-offs of accuracy and interpretability, are popular, especially in learning tasks. A rule learning process was proposed in Debie et al. (2014), it employed an evolutionary algorithm to search for attribute intervals and rules structures, simultaneously. The parameters of DB and RB can be also learned concurrently (Shill et al. 2011).

Particle swarm optimization (PSO) algorithm is one of the evolutionary algorithms which has been significantly used in FISs. Easy implementation, quick convergence, and lower complexity are prominent features of the PSO algorithm, and they have made it popular among researchers (Du and Swamy 2016). Some papers have efficiently employed these features to perform learning and tuning tasks of a fuzzy system. In Zanganeh et al. (2011), a PSO algorithm has been used to tune the rule’s antecedent and consequent parameters with respect to the minimization of an estimated error. In another work, an efficient PSO-based approach has been proposed to construct an FRBS by using data examples (Esmin 2007). A method called fuzzy particle swarm optimization (FPSO) was proposed in Permana and Hashim (2010) to use the capabilities of PSO for generating and adjusting MF automatically. Two different fuzzy classifiers based on Mamdani and TSK FISs have been developed in Elragal (2010), where all parameters of the proposed classifiers and the structure of fuzzy rules are optimized using a PSO algorithm. Because of the faster convergence and the simplicity of PSO algorithm in comparison with genetic algorithm (it balances between exploration and exploitation in the search space Visalakshi and Sivanandam 2009), PSO algorithm is employed for the optimization tasks of this paper.

However, as mentioned in the previous section, some of the generation methods lead to an enormous number of rules. To handle this problem, there are also different strategies in the literature: strategies like removing redundant rule (Patel 2013), merging the overlapping rules (Alonso et al. 2015), rule selection (Alcalá et al. 2007), or rule pruning (Batbarai and Naidu 2014).

In some researches, the concepts of FRBSs and association rule mining are fused so that the strategists of the association rule mining can be used to refine the RB of the FRBSs (Alcalá-Fdez et al. 2011a); e.g., in Shehzad (2013), by using the concepts of Support and Confidence, a measure of significance level is defined for each rule, and then using the values of this measure, less significant rules are pruned, or in Antonelli et al. (2017), to manage unbalanced data in a fuzzy classifier, the rules’ weights are defined using the scaled Support and Confidence. Due to the results of these researches, in this study, an idea related to the association rule mining concepts is also adapted and combined with a PSO algorithm to propose a new rule pruning method.

2.2 Discretization method: Fayyad and Irani’s

In some machine learning applications, discretization methods are required to handle data with continuous attributes (Zeinalkhani and Eftekhari 2014). These methods are employed to convert continuous variables into discrete ones. Indeed, through a discretization process, the domain of a continuous variable is partitioned into several intervals, and consequently, a set of cut points is generated.

The term of cut point (CP) is applied for a value within the domain of a certain continuous variable so that it divides the variable’s domain into two sub-partitions; one partition is less than or equal to that CP, and the other partition is greater than it. Given these definitions, if k CPs are chosen in the domain of a continuous variable, $k+1$ partitions will be built in that domain (Garcia et al. 2013).

Each value within the variable’s domain is a candidate to choose as a CP. The best CPs are picked out based on a splitting measure which is a criterion to evaluate different candidate points. Different splitting measures and accordingly different discretization methods are available in the literature, methods such as binning-based, Chi-square-based, entropy-based, and wrapper-based (Dash et al. 2011).

Entropy is one of the most commonly used discretization measures; it refers to the measure of uncertainty in information being processed. A lot of discretization algorithms have been developed using the entropy measure. They evaluate the entropy of the candidate partitions to select the CPs. Figure 1 shows a pseudo-code of a typical entropy-based discretization algorithm. The process of choosing suitable CPs starts by considering one big partition containing all values of a variable; then, among all candidate points within this partition, that point is accepted as a CP which has the highest information gain; this process is recursively repeated for each generated sub-partition until a stopping condition is met (de Sá et al. 2016).

Fayyad and Irani’s is a popular entropy-based discretization method. It considers the midpoints between each pair of the accepted CPs as the candidate points; then, it evaluates all candidate points and selects that point for which the entropy is minimal. These evaluations recursively continue until a stopping condition, which is determined based on the minimal description length principle, is met. More detail about this method is available in Fayyad and Irani (1993). Fayyad and Irani’s discretization method is utilized in Sect. 3.1 to prepare regression data before model generation.

2.3 Association rule mining concepts

In order to modify conflicting rules in the rule generation process of Sect. 3.2.2, and also to propose a new rule pruning method in Sect. 3.2.3, two rule evaluation metrics related to the association rule mining concepts, namely Support and Confidence, are utilized. In this section, we briefly describe them.

Support is a measure that used to calculate the frequency of a certain rule in a rules set, while Confidence is a measure that employed to specify the reliability of the rules (Bhargava and Shukla 2016). To define these measures, first of all, we denote a generic fuzzy If-Then rule (it is defined in Eq. (18) in “Appendix A”) as:

$$\begin{aligned} \mathrm{Rule}^i : A_i \longrightarrow B_i \end{aligned}$$

(1)

in which $ A_i=\{A_i^1,~A_i^2,~\ldots ,~A_i^n\}$ and it includes all fuzzy sets corresponding to the antecedent part of rule i, and $B_i $ is the fuzzy set of the consequent part.

Support of a certain rule is equal to the fraction of rules in the RB that exactly composed of the same antecedent and consequent of that particular rule. It is calculated for the ith rule as (Bhargava and Shukla 2016):

$$\begin{aligned} \mathrm{Sup}(\mathrm{Rule}^i)= \dfrac{n(A_i~,~ B_i)}{m} \end{aligned}$$

(2)

where $n(A_i~,~ B_i)$ returns the number of rules which contain $A_i$ and $B_i $, simultaneously, and m is the number of available rules in the RB.

Confidence of a certain rule is the proportion of rules that contain both antecedent and consequent parts of that particular rule, to those which only have its antecedent part. It is computed for the ith rule as (Bhargava and Shukla 2016):

$$\begin{aligned} \mathrm{Conf}(\mathrm{Rule}^i)= \dfrac{n(A_i~, ~B_i)}{n(A_i)} \end{aligned}$$

(3)

where $n(A_i)$ is the number of rules which contain all $A_i^k$ in their antecedent part.

Support and Confidence are often used to eliminate uninteresting rules in the rule mining algorithms (Bhargava and Shukla 2016). Generally, the strength of each rule is measured by its Support and its Confidence. Indeed, rules with very low Support are not frequent and may hardly occur for a few data samples. On the other hand, a low Confidence rule has a low probability for its occurrence among similar rules (rules whose antecedents are the same as antecedent part of that rule). These rules are not strong and they do not have an effective role in the inferring process, so if a minimum threshold is defined for Support and Confidence, these weak rules are detected and can be removed from the rules set (Batbarai and Naidu 2014).

This common strategy is usually applied in the rule mining algorithms. Indeed, the rules set is evaluated based on some minimum thresholds, and then those rules that failed to reach the minimum thresholds are eliminated from the rules set. The minimum Support and Confidence thresholds are called MinSupp and MinConf, respectively. So, in the mentioned strategy all rules require satisfying MinSupp and MinConf and the rules set is refined through the two steps as follows (Batbarai and Naidu 2014):

1.:: Find all frequent rules that satisfy MinSupp.
2.:: Extract all high Confidence rules that satisfy MinConf among the frequent rules that have been found in step 1.

The main difficulty of this strategy is specifying the MinSupp and MinConf measures. If an appropriate threshold is not set, some problems arise; e.g., if the minimum threshold is set too high, some interesting rules may be missed, or if it is set too low, a lot of unnecessary rules may remain. On the other hand, by changing the training data, the minimum thresholds should be updated and adapted with the new data, while finding a user-specified minimum threshold for each dataset is time-consuming, it should be found by trial and error or it needs an expert knowledge. Anyway, setting of these thresholds manually, is a hard and risky mission, and a simpler and more accurate alternative method is required. In this regard, a new stronger rule pruning method is proposed in Sect. 3.2.3.

2.4 A brief overview of PSO

PSO (Du and Swamy 2016) is a parallel search algorithm which is inspired by the swarm behaviors. PSO algorithm tries to find the best solution by starting from some initial solutions and optimizing them continuously. This algorithm produces a population of candidate solutions which are called particles; particles are abstract entities that used to simulate solutions; they can move iteratively and randomly through a multi-dimensional search space; they try to improve their positions as the information changes (He et al. 2016).

Unlike genetic algorithm, PSO has no operators such as crossover and mutation. As Fig. 2 shows, it is initialized with a population of random particles. These particles are evaluated using a fitness function. Each particle has a velocity, which manages its movement. Indeed, each particle iteratively changes its position in the search space according to two tips: the best-known position found so far by itself called Pbest, and the best-known position found so far by the entire swarm called Gbest. In this way, each particle can move toward the best current solution until the desired convergence criterion is met.

As these statements, each PSO algorithm consists of three steps, namely generating particles’ positions and velocities, updating particles’ velocities, and updating particles’ positions. The pseudo-code of standard PSO algorithm is presented in Fig. 2. More details about update formulas and circumstances of PSO are available in Du and Swamy (2016).

Due to the general advantages^{Footnote 1} of PSO algorithm (Cheng and Jin 2015; Visalakshi and Sivanandam 2009; Oliveira and Schirru 2009), i.e., simplicity, fast convergence, easy implementation, less operators, better computational efficiency, few parameters to adjust, it is employed for our optimization tasks in this paper. However, it is not an obligation and the proposed model can simply adapt with the other optimization algorithms.

3 Description of the proposed method

This section describes details of the three stages of EEFR-R, namely preprocessing, model generation, and post-processing. Figure 3 illustrates the general scheme of EEFR-R. The preprocessing stage prepares the input data through the discretization and data partitioning tasks; it generates a set of CPs using Fayyad and Irani’s discretization method. The second stage is the main stage of EEFR-R; it constructs the DB and the RB of a Mamdani FRBS through the three tasks of MFs definition, Rule generation, and Rule pruning. Wang and Mendel’s method (WM’s) and PSO algorithm assist in performing the rule generation and rule pruning, respectively. Finally, in the post-processing stage, MFs are tuned and rules’ weights are adjusted using another PSO algorithm. In what follows, we describe in detail the three stages of EEFR-R.

3.1 Preprocessing

When there is no expert knowledge about MFs (e.g., number of MFs of each variable, or support of each MF), a discretization method is employed to compensate this deficiency; indeed, discretization methods generate a set of CPs which are utilized to define MFs and determine fuzzy partitions.

Regarding the result of Zeinalkhani and Eftekhari (2014), which has recommended Fayyad and Irani’s method as one of the most efficient discretization methods in classification tasks, this method is applied in this study to choose CPs and partition data domains. This algorithm is for classification tasks, and it needs a nominal attribute as class label; so, a K-means clustering is performed before it to determine the required nominal outputs. Then, all input and output variables are discretized as described in Sect. 2.2, and a set of CPs is obtained for each variable; the minimum and maximum values of each variable are also added to these sets as the first and last CPs, respectively; i.e., if for variable k with the domain of $[~l_k~,~ u_k~]$, n CPs are selected, a set S, composed of $n+2$ CPs, is built for it as:

$$\begin{aligned} S=\left\{ \mathrm{CP}_ 1^k=l_k,~ \mathrm{CP}_2^k,~ \ldots ,~ \mathrm{CP}_{n+2}^k=u_k\right\} \end{aligned}$$

(4)

Each pair of successive CPs makes one partition; thus, as Fig. 4 illustrates, the domain of variable k is divided into $n+1$ partitions by using the set S. These partitions are employed for definition of MFs in the next section.

3.2 Model generation

To construct a Mamdani FRBS, all of its components including fuzzification, KB, inference engine, and defuzzification should be defined. Among these components, KB, comprised of DB and RB, has a critical role in the fuzzification and inference processes. Hence, in this section, the focus is on the mechanisms of DB and RB generation, and the other components are similar to a typical Mamdani FRBS (“Appendix A”). Since the DB includes fuzzy set definitions and MFs parameters, in the first following subsection, the method of MFs definition is described in details. On the other hand, the RB is composed of a set of If-Then rules which are employed by the inference engine to perform the reasoning operations. In this study, it is proposed to construct the RB via a two-step process: rule generation and rule pruning; these steps are presented to extract all possible rules and to prune additional rules, respectively, and they are demonstrated in the second and third following subsection.

3.2.1 MFs definition

Definition of MFs using discretization methods consists of two steps; In the first step, non-fuzzy partitions are determined using a discretization method, and then in the second step, for each of these partitions an MF is defined (Zeinalkhani and Eftekhari 2014). The operation related to the first step was carried out in the preprocessing stage (Sect. 3.1), and the non-fuzzy partitions corresponding to each variable obtained. In this section, the focus is on transforming non-fuzzy partitions into fuzzy partitions by defining MFs.

MFs are used to map crisp values into fuzzy numbers. For each element X belonging to the fuzzy set A, a degree of membership between 0 and 1 is assigned; it is denoted by $\mu _A (X)$. $\mu _A$ is a mathematical function defined using triangular, trapezoidal, Gaussian, or other types of MFs. Among these types of MF, Gaussian MF, due to its advantages in predictive models, is often used for modeling regression problems (Tay and Lim 2011). Gaussian MF has two parameters, c and $\sigma $, i.e., it is defined as:

$$\begin{aligned} \mu _A (X;\,c,\sigma )=\exp \left( -\dfrac{1}{2} \left( \frac{X-c}{\sigma } \right) ^2 \right) \end{aligned}$$

(5)

c is the mean value which represents the center of MF, and it has the curve peak. $\sigma $ is the standard deviation that controls the curve width.

Four different methods have been proposed in Zeinalkhani and Eftekhari (2014) to define MFs using the generated non-fuzzy partitions. In these methods, different measures are extracted from CPs and partitions, and then based on these measures, MFs are defined; measures such as partition width, standard deviation, neighbor partition coverage rate, and partition coverage rate. In this stage, the method of definition based on partition width is adapted to design MFs. This method utilizes the distance between two CPs to compute the parameters of MFs, and it does not consider how examples are distributed inside each partition.

After the preprocessing tasks of Sect. 3.1, $n+2$ CPs and accordingly $n+1$ partitions were obtained for a typical variable k. In this step, one Gaussian MF is considered for each partition; so, $n+1$ Gaussian MFs must be defined for variable k. Consider the ith MF of variable k is denoted by $\mathrm{MF}_ i^k$, it is designed based on partition i which itself has been built using two successive CPs $\mathrm{CP}_ i^k$ and $\mathrm{CP}_{i+1}^k$. Moreover, as Eq. (5), the two parameters c and $\sigma $ have to be set for each Gaussian MF, therefore, to define $\mathrm{MF}_ i^k$, the values of these two parameters should be determined using the information of $\mathrm{CP}_ i^k$ and $\mathrm{CP}_{i+1}^k$, i.e., they are specified as follows:

$$\begin{aligned} {{\left\{ \begin{array}{ll}c_{\,i}^ {\,k }=\dfrac{(\mathrm{CP}_{i+1}^{\,k } + \mathrm{CP}_ {\,i}^{\,k })}{2} \\ \sigma _ {\,i}^ {\,k }= \dfrac{(\mathrm{CP}_{ i+1}^{\,k }- \mathrm{CP}_{\,i}^{\,k })}{2 \sqrt{\ln 4}} \end{array}\right. }} \end{aligned}$$

(6)

This design is based on these principles: the center of each Gaussian MF is placed at the center of each partition, the membership degrees of the two CPs are equal to 0.5, and each two adjacent MFs intersect at one common CP which is the connection point of their respective partitions. Unlike (Zeinalkhani and Eftekhari 2014), the leftmost and the rightmost MFs do not differ from the middle ones. Figure 5 shows an example of MFs definition for a variable in the domain [0.001, 1.0]; a set composed of four CPs as $\{0.001, 0.1295, 0.5195, 1.0\}$ has been chosen for this variable; CPs have been marked with black solid circles in this figure.

3.2.2 Rule generation

The first step of constructing the RB is rule generation. In this study, the mechanism of rule learning of WM’s (Wang and Mendel 1992), with some modifications, is utilized to generate fuzzy If-Then rules. WM’s approach is a data-driven method that extracts If-Then rules from data samples. It is based on uniform fuzzy partitioning and performs in five steps. The method of rule generation of this study differs from the WM’s one in the fuzzy partitioning mechanism (step 0), and in the determination of the importance degree for each rule (step 2), i.e., it is carried out through the following four steps as:

Step 0: :

Discretization, partitioning, and MFs definition for all variables are done according to the principles of Sect. 3.2.1; this step is a prerequisite for the next steps.

Step 1: :

Extracting one candidate rules from each data sample; suppose that a dataset with m data samples is given, in which each data sample has n input variables and 1 output variable, so the ith row of this dataset is denoted as:

$$\begin{aligned} \left( X_i^1,\ldots ,~ X_i^n,~ Y_i\right) ; \qquad ~ i=1,\ldots ,\, m \end{aligned}$$

(7)

According to these assumptions, the structure of each rule has n antecedents, 1 consequent, and totally $n+1$ dimensions. For this data sample, the corresponding linguistic terms of each dimension are determined with the fuzzy set whose output is maximum, same as the original WM’s (Wang and Mendel 1992). By repeating this procedure for all m data samples, m candidate rules are obtained, so that each of them may be or may not be a member of the final RB.

Step 2: :

Assign an importance degree for each candidate rule; the importance degree is a criterion to evaluate strength of each rule in comparison with similar rules in a group of conflicting rules. It is determined by multiplication of Support and Confidence (two rule evaluation metrics introduced in Sect. 2.3), i.e., the importance degree, ID, is assigned for rule i as:

$$\begin{aligned} \mathrm{ID}(\mathrm{Rule}^i)=\mathrm{Sup}(\mathrm{Rule}^i) *\mathrm{Conf}(\mathrm{Rule}^i) \end{aligned}$$

(8)

where $ \mathrm{Sup}(\mathrm{Rule}^i) $ is the Support of rule i as Eq. (2), and $ \mathrm{Conf}(\mathrm{Rule}^i) $ is its Confidence as Eq. (3).

Step 3: :

Classify conflicting rules; rules with exactly the same antecedents but different consequent conflict with each other. Such these rules are classified into one group, and among them, the most appropriate one should be selected.

Step 4: :

Choose the final fuzzy If-Then rules; from each group, that rule which has the highest importance degree is chosen as a final member of the RB. Regarding the contribution of utilizing the measures of Support and Confidence to define the rules’ importance degrees, the strongest rules will be chosen based on their reliability as well as their frequency. In other words, if a rule has the highest degree of frequency (Support) and reliability (Confidence), its importance degree will be the highest and it will be chosen from its group. In this regard, more qualified rules are generated in the first step of constructing the RB.

3.2.3 Rule pruning

In this section, a new rule pruning method is presented to modify the rules set by removing weak rules. Up to this step, a set of If-Then rules with no conflict has been obtained. However, this set is not optimal yet, the number of rules is high, and the set may contain plenty of redundant and not effective rules; so, a modification process is needed to refine this set.

In Sect. 2.3, a common strategy to eliminate additional rules using MinSupp and MinConf was described. The drawbacks of that strategy were also stated. In this section, a new rule pruning method is presented based on that strategy and it is proposed to specify the minimum thresholds automatically and regarding the training data instead of setting fixed values for all situations. For this purpose, the mentioned strategy is combined with an optimization algorithm.

Figure 6 shows the proposed rule pruning algorithm. It is comprised of two rounds. In the first round, the RB is scanned to find the most frequent rules, while in the second round, it is searched to find the most reliable rules. As illustrated in Fig. 6, each round has two steps; at first, a PSO algorithm is run to find the most appropriate minimum threshold, and afterward, the RB is scrutinized and those rules which do not satisfy the minimum thresholds, are eliminated from it.

In this method, finding the appropriate values for MinSupp and MinConf is delegated to the PSO algorithm. It carries out this mission by considering two important perspectives of the rules set, namely cardinality^{Footnote 2} and arising error. Indeed, the RB must be composed of those rules that optimize these two criteria simultaneously as much as possible. To achieve this goal, two new criteria, called Reduction and Increase, are introduced; they are related to the cardinality of the RB and the system error, respectively.

3.2.4 Definitions

Reduction is defined to control the number of rules in the RB. It is equal to the percentages of diminution in the number of rules after a pruning process, i.e., the Reduction, R, is defined as:
$$\begin{aligned} R=(1-\# \mathrm{NR}_\mathrm{after}\slash \# \mathrm{NR}_\mathrm{before}) \times 100 \end{aligned}$$
(9)
where $\# \mathrm{NR}_\mathrm{before}$ and $\# \mathrm{NR}_\mathrm{after}$ are the cardinality of the RB before and after the pruning process, respectively.
Increase is defined to control the system error. After a pruning process, due to eliminating some rules, it is anticipated that the system error rises. Increase criterion is considered to measure this error increment, and it is equal to the percentages of error increment after applying a rule pruning process. Therefore the Increase, I, is defined as:
$$\begin{aligned} I=(E_\mathrm{after}\slash E_\mathrm{before}-1)\times 100 \end{aligned}$$
(10)
where $E_\mathrm{before}$ and $E_\mathrm{after}$ are the system errors before and after the rule pruning process, respectively. Mean square error is used to measure the system error in each situation, and it is computed as:
$$\begin{aligned} E=\left( \frac{1}{2\times |D|}\right) \times \sum _{i=1}^{|D|} \, \left( F(\overrightarrow{X_i}) -Y_i\right) ^2 \end{aligned}$$
(11)
where |D| is the number of data samples in the training dataset D, $ \overrightarrow{X_i} $ is the ith training input vector, $ F(\overrightarrow{X_i}) $ is the estimated output for this sample using the FRBS, and $ Y_i $ is the ith target value.

3.2.5 Fitness function

It is clear that a configuration of the RB, which provides maximum Reduction and minimum Increase simultaneously, is desirable. It means that the maximum number of weak rules should be eliminated, so that the minimum cost be imposed on the system error. PSO algorithm has undertaken this goal. It tries different values of MinSupp and MinConf and evaluates different states of pruning until the best one is found. For this meaning, a fitness function using Reduction and Increase is made for PSO algorithm. Due to the goal of pruning process (maximum Reduction and minimum Increase), and given that PSO algorithm minimizes its fitness function, a fractional function in which Increase is in the numerator and Reduction is in the denominator, is considered as the fitness function of PSO algorithm, i.e., it is formulated as follows:

$$\begin{aligned} \mathrm{fitness}=\frac{a \times I^k + b}{c \times R^p+ d} \end{aligned}$$

(12)

where I and R are normalized values of Reduction and Increase, respectively. a, b, c, d, k, and p are parameters of the model; they are used to adjust the model to a specified training data. a, b, c, and $ d~ (a\ne 0, c\ne 0) $ are coefficients that utilized to make a nonlinear function of Reduction and Increase. They can be usually set to $a=c=1$ and $b=d=0$. The most important parameters in this definition are k and p which provides different trade-offs between Reduction and Increase. p is called Reduction impact factor, and it determines the degree of influence of Reduction in the pruning process. Similarly, k is called Increase impact factor, and it specifies the importance of Increase in the pruning process.

3.2.6 Different pruning modes

Depending on the values assigned to the factors k and p, three modes of pruning are provided as follows:

(1)
If $k>p\ge 1$, less pruning occurs. In this mode, Increase has a more effective role in the pruning operation rather than Reduction; the system error is in the best situation; and the cardinality of the RB is greater than the other modes.
(2)
If $1\le k< p$, more pruning occurs. This time, Reduction influences the pruning process more than Increase. In this mode, although the number of rules is less than all other modes, the error increment might be slightly higher.
(3)
If $k=p=1$, a moderate state is obtained. In this mode, the roles of Reduction and Increase in the pruning operation are the same so that the cardinality of the RB and the error increment are in a moderate situation.

These different modes are also summarized in Table 1. Depending on the application that this model will be applied in, and given the degree of importance of each aspect, a decision maker can set the parameters of the fitness function. According to our experiments, the third mode provides acceptable results in the most applications.

Table 1 Different modes of rule pruning

EEFR-R: extracting effective fuzzy rules for regression problems, through the cooperation of association rule mining concepts and evolutionary algorithms

Abstract

Similar content being viewed by others

Sifter: an approach for robust fuzzy rule set discovery

Classification Rule Mining Algorithm Combining Intuitionistic Fuzzy Rough Sets and Genetic Algorithm

A Novel Genetic Fuzzy System for Regression Problems

Explore related subjects

1 Introduction

2 Preliminaries

2.1 Literature review

2.2 Discretization method: Fayyad and Irani’s

2.3 Association rule mining concepts

2.4 A brief overview of PSO

3 Description of the proposed method

3.1 Preprocessing

3.2 Model generation

3.2.1 MFs definition

3.2.2 Rule generation

3.2.3 Rule pruning

3.2.4 Definitions

3.2.5 Fitness function

3.2.6 Different pruning modes

3.2.7 Several remarkable points

3.3 Post-processing

4 Experimental results

4.1 Datasets, parameters and evaluation criteria

4.2 The role of rule pruning process

4.3 The role of MFs and weight tuning

4.4 Comparing the overall performance of methods

4.4.1 Selected methods from the literature for comparisons

4.4.2 Results of comparisons

4.4.3 Statistical nonparametric tests

4.5 Time evaluation

5 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix A: overview of Mamdani FRBSs

Appendix B: an illustrative example of EEFR-R

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation