Abstract
In the design of fuzzy rule-based models we strive to develop models that are both accurate and interpretable (transparent). The approach proposed here is aimed at the enhancement of transparency of the fuzzy model already constructed with the accuracy criterion in mind by proposing two modifications to the rules. First, we introduce a mechanism of reduction of the input space by eliminating some less essential input variables. This results in rules with the reduced subspaces of input variables making the rules more transparent. The second approach is concerned with an isolation of input variables: fuzzy sets defined in the n-dimensional input space and forming the condition part of the rules are subject to a decomposition process in which some variables are isolated and interpreted separately. The reduced dimensionality of the input subspaces in the first approach and the number of isolated input variables in the second one are the essential parameters controlling impact of enhanced transparency on the accuracy of the obtained fuzzy model. The two problems identified above are of combinatorial character and the optimization tasks emerging there are handled with the use of Genetic Algorithms (GAs). A series of numeric experiments is reported where we demonstrate the effectiveness of the two approaches and quantify the relationships between the criterion of accuracy and interpretability.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
- Rule-based models
- Interpretability
- Reduction
- Isolation of variables
- Fuzzy clustering
- Curse of dimensionality
1 Introductory Notes
In the construction of fuzzy rule based systems, we have been witnessing a wealth of design strategies and detailed algorithms involving the technology of Evolutionary Computing and neurocomputing. Just recent developments reported in this realm can be found in a series of studies [1–3, 5, 6, 8]. Predominantly, the development of fuzzy models is guided by the criterion of accuracy. Another fundamental criterion being at the heart of fuzzy modeling is interpretability (transparency) of resulting fuzzy models. This criterion is central to fuzzy models however its multifaceted nature requires a thorough formulation and a detailed quantification of essential aspects of interpretability. Subsequently, it calls for engaging advanced optimization techniques supporting the realization of the ensuing design.
The concept of interpretability of fuzzy rule-based models has been around for several decades and attracted a significant deal of attention. The transparency of fuzzy models is one of the outstanding and important features of fuzzy models. In contrast to the criterion of accuracy, whose quantification is relatively straightforward and easy to come up with performance indexes, transparency of fuzzy rules is more difficult to describe. What makes the fuzzy rule-based easier to interpret and comprehend is still an open issue. It is quite subjective to assess and in one way or another invokes a factor of subjective judgment given that a human user is ultimately involved in the evaluation process. What also becomes apparent, is a multifaceted nature of the problem and a multitude of various approaches supported by various optimization technologies including evolutionary optimization. When it comes to the main factors worth considering when discussing a concept of interpretability, we can enumerate a list of factors that may be involved in the reduction process:
-
number of rules forming a rule base of the model,
-
number of sub-conditions (input variables) forming a condition part of a given rule,
-
number or rules and the number of input variables,
-
complexity of local regression models forming the conclusion part of the rules (in case of Takagi-Sugeno model)
-
interpretability of a family of fuzzy sets formed in the input space for individual variables.
As a result, given this diversity of possible ways of reduction of rules, it is difficult to quantify the effect of reduction. For instance, it is not always clear if it would be better to have a larger number of simple rules (whose condition parts are linear functions) or a smaller number of rules of a more complex conclusion parts (say, those of polynomial form).
The reader may refer to a large body of studies devoted to the issue of interpretability, cf. [3, 10]. Quite often, given the combinatorial nature of the reduction problems, Genetic Algorithms are used in the optimization process, see [9]. There are also more specialized algorithmic vehicles to support the reduction process such as e.g., singular value decomposition [14] however one has to be cognizant as to the interpretability of the reduced rules.
As to the overall strategy of rule reduction and interpretability enhancement, there are two general design strategies: either the reduction is completed once the model has been constructed or the reduction mechanism of rule-based modeling is incorporated into the design process from its very beginning.
The objective of the study is to pursue a fundamental issue of building more transparent and user-centric fuzzy rule-based models by starting from an already designed fuzzy model. The intent is to make the rules more readable in two different ways: (a) by reducing the input space (the number of antecedents) of the individual rules, and (b) by isolating input variables completed for the input variables treated en block in the condition parts of the rules.
In both these fundamental scenarios we can establish and quantify a tradeoff between a gradual reduction of accuracy (which is inevitable when realizing any of the reduction mechanisms of the rules and enhancing its intensity) and the increased interpretability of the rules. The presentation of the material is structured in the following way. We briefly revisit the essentials of Takagi-Sugeno rule- based systems, stressing the proposed design approach in which we use fuzzy clustering (Sect. 2). The reduction of input space and a formation of input subspaces for each rule is introduced in Sect. 3; in the same section we also present an optimization process realized with the use of genetic algorithm. The second approach to the enhancement of the interpretability of the rule – an isolation of input variables is discussed in Sect. 5. Detailed experimental studies are reported in Sects. 3 and 6.
In the study, we experiment with a number of publicly available datasets coming from eight datasets from UCI Machine learning repository and DELVE repository; their main characteristics are listed in Table 1.
2 Fuzzy Rule-Based Models: An Overview and Main Design Issues
Our point of departure of the reduction processes of the rules is a “standard” Takagi-Sugeno fuzzy model comprising c rules coming in the following form
for i = 1, 2, .., c where x \( \in \) R n. Ai is a fuzzy set defined in the n-dimensional space while fi is the corresponding local model (linear or nonlinear) endowed with its parameters a i forming the conclusion part of the i th rule. The design of such models is well reported in the literature and as usual consists of the two main steps, namely (a) a construction of condition parts (through clustering of input data done in the input space) and (b) estimating parameters of the linear models (which leads to the problem of linear regression). The number of rules (c) is determined by monitoring the behavior of the model on the training and testing data. The rules in the form (1) come with several essential properties. The condition part is a fuzzy set expressed in R n making the rules concise, which helps avoid a curse of dimensionality we are commonly faced with in rule based systems with a higher number of input variables. In the case of treating all input variables at the same time, the number of rules becomes small (c) and the rule base itself is compact. Unfortunately, the interpretability could be negatively impacted as no individual variables in the condition part are treated and visualized separately.
When it comes to the quantification of the accuracy of the fuzzy model (1) its performance is commonly expressed by the RMSE index computed for the training set
where N denotes the number of data in the training set. In the same way, quantified is the performance of the constructed model on the testing data (consisting of M data points)
Proceeding with the data summarized in Table 1, the performance of the corresponding models visualized versus the number of rules is illustrated in a series of figures shown below, Fig. 1. The results are reported both for the training and testing data. In the design, we use a standard version of the Fuzzy C-Means (FCM) [4] with the fuzzification coefficient (m) set to 2.0. The algorithm was run for 10 iterations (more specifically, we completed 10 runs with different splits of data into training/testing data)
For the training sets, there is a general tendency of having lower values of the RMSE with the increase of the number of rules. The performance reported on the testing sets points at the memorization effect where the some models tend to lose their generalization capabilities. By eyeballing these plots, we choose a suitable number of rules (clusters) where sound approximation abilities come hand in hand with the generalization of the models. The values selected in this way are collected in Table 2.
3 Reduction of Input Subspaces in Rule-Based Models
The essence of the enhancement of interpretability of the rules is accomplished by reducing the number of input variables standing in the condition parts of the rules. The reduced rules are concisely described in the form
where the symbol [] X i stresses the fact that the fuzzy set A i is now effectively confined to the reduced input space X i \( \subset \) R n where some original input variables have been removed. In other words, dim (X i ) = n i < n.
The computing of the activation level of A i positioned in this new reduced space is realized as follows
for i = 1,2,.., c; m > 1 where the computations of the distance are realized in the reduced input space X i .
The reduction of the rules can be quantified in terms of a reduction factor ν, which relates with the number n*c (expressing an overall number of variables across all the rules) in the following way
where p represents a reduced number of input variables used in c rules.
The reduction of the input variables existing in the new, more interpretable rules is done via engaging the optimization capabilities of Genetic Algorithms (GAs). More specifically, we optimize a matrix of allocation of input variables W = [w ij ] with c rows and n input variables. The p largest entries of W are selected giving rise to a binary 0–1 matrix. The p entries with the largest values are set to 1 while the remaining ones are suppressed to zero. Each row of the matrix formed in this way identifies the variables to be used in the corresponding reduced rule. If all entries of the i th row of W are equal to zero, this entails that the corresponding rule does not exist in the reduced set of rules.
The process of identifying which input variables should be kept in the antecedents of rules is translated into an optimization problem. Its solution is obtained through evolutionary optimization, namely Genetic Algorithm (GA) [8]. GA uses elements of natural selection to determine the best solution to a problem by minimizing a certain fitness function capturing the essence of the optimization problem. The best solution is obtained via selecting and modifying a population of potential solutions (chromosomes). A main flow of GA computing is outlined in Table 3.
The two aspects of GA that require special attention and directly impact the quality of results of the optimization process are: 1) construction of a suitable fitness function, and 2) mapping a solution to the problem onto a structure of the chromosome.
A fitness function is used to evaluate possible solutions. Fitness values associated with solutions (chromosomes) represent their ability to solve a given problem. The values are used during selection of chromosomes for further processing (Table 3, line 5). The fitness function has to reflect the objective of optimization process and ensure that the obtained fitness values are adequate for mechanisms of selection, i.e., the fitness values should be such that a selection process leads to a diversified set of potential solutions [11]. In the case of our problem of selecting input attributes that should be kept in the rules, the fitness function evaluates quality of fuzzy rules that model a given dataset. The form of the fitness function used in the optimization is given by (2).
As noted earlier, each chromosome represents a solution to the considered problem. It consists of simple elements, called genes, that can assume values 1 or 0 (Binary Coded GA), or any real numbers from a specified range (Real Coded GA). A chromosome with such a simple structure, called genotype, has to represent a solution to the problem, i.e., it should be mapped into a form called phenotype that is adequate for a given problem domain.
For the optimization problem considered in the paper, we use a Real Coded GA (RCGA). The size of chromosome, i.e., a number of genes gsize, is equal to the product of a number of rules c and a number of input attributes n of a given dataset: \( gsize = cn \). Therefore, each chromosome contains information which input attributes are present in each rule. A single gene is a real number between 0 and 1. The translation of such a chromosome (genotype) into a solution to our problem (phenotype) occurs in the following way. The user has to determine a number p of input attributes or isolated attributes (Sect. 5) that should be kept in all c rules. This means that among all genes of a chromosome only p of them should be used to construct c rules. The selection of these genes is done via sorting all genes based on their values (between 0 and 1) and identifying the first p of them. The process is presented in Fig. 2.
Once an initial population of chromosomes c k has been generated, RCGA starts an iteration process. A single iteration consists of calculating fitness values (Table 3, line 4), selecting best chromosomes (line 6), and performing crossover and mutation operations on selected chromosomes (line 7). All this constitutes a single generation. After executing a number of generations, GA provides an optimal solution in a form of a chromosome that gives the highest value of the fitness function.
The purpose of crossover is to exchange information between chromosomes. In its simplest form it means exchange of genes between a pair of randomly selected, with the probability p c given by the user, chromosomes. For our RCGA, a linear crossover is applied [13]. The linear crossover generates three offspring (new chromosomes): \( O_{{k}} {{ = (o}}_{ 1}^{{k}} ,\ldots , {{o}}_{{i}}^{{k}} ,\ldots {{o}}_{{n}}^{{k}} ) , {{k}} = 1,2,3 \), where \( {{o}}_{{i}}^{\text{k}} \) represents the ith gene of the kth chromosome. The genes of offspring are built from the genes of two parents. Let us assume that the parents are: \( {{C}}_{{k}} {{ = (c}}_{ 1}^{{k}} ,\ldots , {{c}}_{{i}}^{{k}} ,\ldots {{c}}_{{n}}^{{k}} ) , {\text{ k}} = 1,2 \). The values of genes of the offspring are calculated in the following way:
Each of the offspring is evaluated, i.e., a fitness value is determined for each of them. The two most promising offspring are selected to substitute their two parents in the population.
The mutation operator modifies genes of a single chromosome in order to introduce diversity to the population. The frequency of modifications is controlled by the mutation probability p m provided by the user. Mutation ensures that a search process covers the whole space of possible solutions. In the case of RCGA, the Muhlenbein’s mutation is adopted [12]. For a given parent chromosome \( {\textit{C}}_{\textit{k}} {{ = \textit{(c}}}_{\textit{1}} ,\ldots , {\textit{c}}_{\textit{i}} ,\ldots {\textit{c}}_{\textit{n}} ) \), the gene values of a new – mutated – chromosome are calculated as \( {\textit{c}}_{\textit{i}}^{\prime } {\textit{ = c}}_{\textit{i}} \pm {\textit{rang}}_{\textit{i}} \bullet {\textit{g}} \), where \( {\textit{rang}}_{\textit{i}} \) defines the mutation range, and it is normally set to \( 0.1 * ( {\textit{b}}_{\textit{i}} {\textit{ - a}}_{\textit{i}} ) \), where \( {\textit{b}}_{\textit{i}} \) and \( {\text{a}}_{\text{i}} \) are the maximum and the minimum of \( {\text{c}}_{\text{i}} \), respectively. The sign + or − is chosen with a probability of 0.5, while \( {\text{g = }}\sum\limits_{\text{k = 0}}^{ 1 5} {{\textit{a}}_{\textit{k}} 2^{\text{ - k}} , {\textit{a}}_{\textit{k}} \in \{ 0,1\} } \) is randomly generated with \( {\textit{p(a}}_{\textit{i}} = 1 )= \frac{1}{16} , {\textit{ i }= }0.15 \).
The operations on a single chromosome are implemented in such a way that at least one input attribute is kept in each rule.
For reduction of input spaces, if p is a number of input attributes to be kept, the top p genes will be used to calculate the model output. The rest gsize-p attributes will be omitted in the calculation.
For isolation of attributes (Sect. 5), if p is a number of isolated attributes in each rule, the top p attributes in each rule (row of the matrix W) will be marked as isolated. The rest n − p attributes will be treated as the remaining group of attributes. Thus, the total number of isolated attributes is pc.
4 Experimental Studies
In the following experiments we present how the reduction of the rules proceeds and how the reduced, more interpretable rules perform. We use different values of ν and report the corresponding values of the RMSE for the training and the testing data. The GA used a population of 100 individuals and was run for 100 generations. The crossover rate was set to 0.8 while the mutation rate was equal to 0.1. The choice of these numeric values was a result of some preliminary experimentation. The results are quantified by reporting the RMSE values obtained for different values of the reduction index; refer to Fig. 3.
It becomes apparent (and intuitively anticipated) that lower values of ν result in higher RMSE values. The detailed behavior varies across data with regard to how far the rules can be reduced and how the differences shape up for the training and testing data. For example, the reduction could be made quite substantial not compromising the performance of the model as this becomes present in case of abalone, auto, concrete, and white wine. In some case, we witness a phenomenon of increased generalization abilities of the model (lower differences of the RMSE for the training and testing data for lower values of ν). Figure 4 illustrates the performance of the GA for some selected data; most of the improvement is visible at the beginning of the optimization (first 20–30 generations).
The detailed results of reduction of the number of variables in the rules are contained in Fig. 5. The shaded regions identify the input variables being retained in the corresponding rules. This offers a better view as to which input variables can be dropped and points at a sequence of the variables, which have been eliminated.
5 Isolation of Input Variables
To enhance the transparency of the rules, we express the fuzzy set A i as a Cartesian product of a single isolated fuzzy set defined in R and a relational remainder expressed in R n−1. In other words, we form the expression describing the condition part as follows
where A ^ i is a fuzzy sets defined in R and A ~ i is expressed in R n−1. Then the rules of the form read as follows
Here a certain input variable (jth one) has been selected to be isolated. The term isolation pertains to the fact that a certain variable has been chosen and subsequently a fuzzy set isolated from the fuzzy set Ai is treated separately and thus becomes more visible and interpretable.
Obviously, the above Cartesian product is not identical to the original A i that is the following holds
In terms of membership functions this means that the following relationship holds
Note that the corresponding membership functions of A ^ i (x j ) and A ~ i (x ~) are computed as follows
The consequence is that if A ^ i (x j ) \( \times \) A ~ i ( x ~ ) is used as the condition part of the ith rule, the output of the model is going to be different than the original rule. It is likely that the accuracy of the model could be reduced as a result of the increased interpretability of the rules because of the isolation of the input variables. In the above formulation, one is interested in choosing an individual variable (jth one) for which the results provided by the rule-based model are as close as possible to those formed by the original fuzzy model. The selection of the input variable is quite straightforward through a direct enumeration.
The plots showing the LHS and the RHS relationship is shown for concrete strength with one or five isolated input variables.
From the plots above, several observations of a general character can be drawn. First, the values of the LHS are higher than the corresponding ones for the RHS. This is reflective of the fact that the separation of the variable(s) leads to the higher activation levels of the rules with eventual reduction of the specificity of the results of reasoning. It is also apparent that with the increase of the variables being isolated – compare Fig. 6a, b, the differences between the values produced by the LHS and RHS of the expression (10) are more profound. Again, this is not surprising as by isolating more variables we depart from the RHS more vigorously.
In a general setting, one can realize an isolation of L input variables, which as a result leads to the rules in the form
note that in this case x ~ is defined in R n−L.
Here an optimal choice of L variables gives rise to a combinatorial optimization problem. This could be solved by GA optimization. Considering that the value of L is specified in advance, GA forms an optimal isolation matrix I consisting of c rows (number of rules) and n rows (number of input variables) where in each row there are L 1 s indicating the variables which are isolated in the rule. For instance, for L = 3 the matrix with the entries
states that in the first rule isolated are variables 1, 4, and 5; in the second rule we isolate variables 2, 3, and 4, and so on.
6 Experiments
The GA was carried out with 100 populations and maximum 50 generations. Preliminary experiments with GA indicated lack of improvement before 50th generation, so the maximum generation is set to 50. The population is not large, so relative large values of moderate crossover and mutation rates are used. The crossover rate is set as 0.8 and mutation rate is 0.1.
We present the results in a similar way as before by focusing on the presentation of the rules with isolated variables and showing how the families of isolated variables impact the performance of the model (Figs. 7, 8 and 9).
7 Conclusions
The two approaches enhancing the interpretability of rule-based models are directly applied to the already constructed Takagi-Sugeno fuzzy models realized with the use of fuzzy clustering. Both the formation of the input subspace of conditions as well as the isolation of the input variables are the methods refining multivariable fuzzy sets produced through fuzzy clustering. The proposed approaches are quantifiable in terms of the level of the interpretability abilities offered by them (expressed either in terms of the number of variables eliminated or the variables isolated). This aspect is helpful in determining how much the interpretability could be enhanced without any significant sacrifice of accuracy of the model. Furthermore in this way one could reveal input variables (or their combinations) that are essential in rule-based modeling.
The approach offers a certain new view at the enhancement of fuzzy rule-based models. There could be several avenues worth pursuing in the future including (a) development of a hybrid arrangement of the formation of subspaces of conditions of the rules associated with some further isolation of variables from such subspaces, (b) use of other techniques of Evolutionary Optimization in the entire process, (c) construction of interpretability measures quantifying various facets of the interpretation mechanisms.
References
Alcala, R., Gacto, M.J., Herrera, F.: A fast and scalable multiobjective genetic fuzzy system for linguistic fuzzy modeling in high-dimensional regression problems. IEEE Trans. Fuzzy Syst. 19(4), 666–681 (2011)
Alonso, J.M., Magdalena, L., Guillaume, S.: Linguistic knowledge base simplification regarding accuracy and interpretability. Mathware Soft Comput. 13(3), 203–216 (2006)
Ayouni, S., Yahia, S.B., Laurent, A.: Extracting compact and information lossless sets of fuzzy association rules. Fuzzy Sets Syst. 183(1, 16), 1–25 (2011)
Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York (1981)
Bodenhofer, U., Bauer, P.: A formal model of interpretability of linguistic variables. In: Casillas, J., Cordón, O., Herrera, F., Magdalena, L. (eds.) Interpretability Issues in Fuzzy Modeling, pp. 524–545. Springer, Berlin (2003)
Chen, M.Y., Linkens, D.A.: Rule-base self-generation and simplification for data-driven fuzzy models. Fuzzy Sets Syst. 142(2), 243–265 (2004)
Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley (1989)
Gacto, M.J., Alcalá, R., Herrera, F.: Integration of an index to preserve the semantic interpretability in the multiobjective evolutionary rule selection and tuning of linguistic fuzzy systems. IEEE Trans. Fuzzy Syst. 18(3), 515–531 (2010)
Ishibuchi, H., Yamamoto, T.: Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining. Fuzzy Sets Syst. 141(1), 59–88 (2004)
Krone, A., Krause, H., Slawinski, T.: A new rule reduction method for finding interpretable and small rule bases in high dimensional search spaces. In: Proceedings of 9th IEEE International Conference on Fuzzy System, San Antonio, TX, pp. 693–699 (2000)
Mitchell, M.: Introduction to Genetic Algorithms. MIT Press (1998)
Muhlenbein, H., Schlierkamp-Voosen, D.: Predictive models for the Breeder genetic algorithm I. Continuous Parameter Optim. Evol. Comput. 1, 25–49 (1993)
Wright, A.: Genetic algorithms for real parameter optimization. In: Rawlin, G.J.E (ed.) Foundations of Genetic Algorithms 1, pp. 205–218. Morgan Kaufmann, San Mateo (1991)
Yam, Y., Baranyi, P., Yang, C.T.: Reduction of fuzzy rule base via singular value decomposition. IEEE Trans. Fuzzy Syst. 7, 120–132 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Pedrycz, W., Li, K., Reformat, M. (2015). Evolutionary Reduction of Fuzzy Rule-Based Models. In: Tamir, D., Rishe, N., Kandel, A. (eds) Fifty Years of Fuzzy Logic and its Applications. Studies in Fuzziness and Soft Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-19683-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-19683-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19682-4
Online ISBN: 978-3-319-19683-1
eBook Packages: EngineeringEngineering (R0)