1 Introduction

In the last years, Mamdani-type FRBSs (Mamdani and Assilian 1975) have been extensively and successfully applied to several engineering domains such as control, classification, regression and identification. As stated by Cordón et al. (2004), Mamdani-type FRBSs consist of:

  • A rule base (RB) composed of linguistic IF-THEN rules, such as if the temperature is hot then the fan speed is high, where both the antecedent and the consequent parts are fuzzy propositions.

  • A data base (DB), which associates a semantics, represented by means of fuzzy sets, with the linguistic terms used in the RB, e.g., hot and high.

The RB is often derived from heuristic knowledge, which is usually valid independently of the real environment where the FRBS will work. Thus, the RB can be considered as a context-free model. Indeed, the real environment does not affect the RB, but rather influences the DB, and, more specifically, the meaning associated with each linguistic term used in the rules. For instance, in the aforementioned rule, the meaning associated with the linguistic term high depends on the heat perception of each person, which is biased by, e.g., the latitude.

The several interesting possible applications have raised a large interest among researchers and practitioners in developing effective and efficient methods to generate highly performing and/or interpretable Mamdani-type FRBSs. In this framework, context adaptation (CA) is one of the most promising techniques, though it cannot be considered completely explored yet. CA of FRBSs moves from the following assumption: the RB models universal knowledge that can be reused in many different environments (Pedrycz et al. 1997). In each environment, the meanings associated with linguistic terms have to be adjusted so as to improve accuracy of the FRBS. Thus, CA is a tuning process that exploits context-specific information so as to adapt the parameters contained in the DB (Cordón et al. 2004).

The above-mentioned assumption implies that the RB should not be modified during the adaptation process. Since there exists a semantic relationship between the RB and the DB, this implication forces some constraints on the tuning of the DB.

First, since each linguistic term is bound to a fuzzy set, each fuzzy set is useful and meaningful only if the associated linguistic term is used in the RB. Thus, the number of fuzzy sets in each partition is directly determined by the number of linguistic labels defined in the RB and, therefore, should not vary during CA.

Second, when experts define the RB, they use an implicit semantic ordering of linguistic terms that is significant to humans. For instance, the linguistic term high will always follow low, and cold will always precede hot. The ordering of linguistic terms is usually modeled in fuzzy partitions by some ordering of fuzzy sets. Consequently, to preserve semantics, the post-CA ordering of the fuzzy sets should reflect the pre-CA ordering of linguistic terms.

To summarize, each CA approach should comply with the following guidelines:

  1. 1.

    Context adaptation should not modify the RB;

  2. 2.

    Context adaptation should not change the number of linguistic terms defined in the RB and, consequently, the number of corresponding fuzzy sets;

  3. 3.

    Context adaptation should not affect the semantic ordering of linguistic terms.

In this paper, we propose an approach to CA of Mamdani-type FRBSs aimed at improving accuracy while following the aforementioned guidelines and preserving interpretability of partitions (de Oliveira 1999; Casillas et al. 2003). Our approach exploits a novel interpretability index based on fuzzy ordering relations. The proposed index and the MSE are used as objectives of a MOEA aimed at generating a set of Pareto-optimum FRBSs with different trade-offs between accuracy and interpretability. We start from an RB which represents a universal knowledge extracted with an elicitation process by domain experts. Then, adaptation is obtained by applying a modified version of the operators introduced in Botta et al. (2006a).

The paper is organized as follows: Sect. 2 discusses previous work on CA of FRBSs. Section 3 describes the operators used in our CA approach. Section 4 discusses the issue of ordering and interpretability of FRBSs and introduces a novel index to evaluate both these properties concurrently. Section 5 details the MOEA. Section 6 shows two application examples in the fields of regression and data modeling, respectively, and Sect. 7 draws final conclusions.

2 Previous work

In the literature, the majority of papers on CA of FRBSs have mostly focused on the use of scaling functions (Bastian 1994; Gudwin and Gomide 1994; Magdalena 1997; Pedrycz et al. 1997; Gudwin et al. 1998; Cordón et al. 2001; Magdalena 2002; Botta et al. 2006a, b). Usually, the scaling function is applied to a normalized partition, that is, a partition defined over the [0,1] universe of discourse and uniformly partitioned into triangular, trapezoidal or Gaussian fuzzy sets. The number of fuzzy sets for each partition coincides with the number of linguistic terms defined for the linguistic variable corresponding to the universe of the partition. The scaling function adapts the partition by mapping the normalized universe to the context-adapted universe, possibly modifying the distribution and the shape of fuzzy sets.

The scaling functions used in the literature can roughly be classified into linear (Bastian 1994; Gudwin and Gomide 1994; Magdalena 2002) and non-linear (Magdalena 1997; Pedrycz et al. 1997; Gudwin et al. 1998; Cordón et al. 2001; Klawonn 2006; Botta et al. 2006a, b) Non-linear scaling functions can be applied on the overall universe of discourse, thus modifying the shape of fuzzy sets, as in Magdalena (1997); Pedrycz et al. (1997); Gudwin et al. (1998); Klawonn (2006), or just on some points (e.g., on breakpoints in the case of triangular and trapezoidal fuzzy sets), as in Cordón et al. (2001); Botta et al. (2006a b), so as to maintain the original shape of the fuzzy sets and the interpretability of the partition. Figure 1 shows an example of application of scaling functions to a normalized partition. We note that both linear and non-linear scaling functions comply with the guidelines introduced in Sect. 1. We further remark that scaling functions do not always perform an effective CA, mostly because of their limited modeling capabilities. To overcome this weakness, in Botta et al. (2006b), we introduced four parametric operators that affect the core, the support and the boundary elements of fuzzy sets. While the new operators improve modeling capabilities of scaling functions, they may also modify the partition in a way that makes the ordering less evident. Furthermore, the interpretability of the resulting partition may be affected as well. In this context, relevant contributions to the identification of parameters of scaling functions and CA operators maximize only the accuracy of the context-adapted FRBS by exploiting evolutionary algorithms (Magdalena 1997; Gudwin et al. 1998; Cordón et al. 2001; Botta et al. 2006a, b).

Fig. 1
figure 1

Examples of application of different types of scaling functions to a normalized partition

On the other hand, the issue of balancing interpretability and accuracy has been addressed in the field of FRBS generation from data (de Oliveira 1999; Casillas et al. 2003). Often, this issue has been tackled by using MOEAs aimed at generating a set of FRBSs with different trade-offs between the two objectives (Ishibuchi et al. 1995; Jimenez et al. 2001; Wang et al. 2005; Ishibuchi and Nojima 2007; Gonzalez et al. 2007; Cococcioni et al. 2007). The majority of existing approaches, however, focus on learning the RB from examples rather than on adapting the meaning of linguistic terms to a specific context. Further, interpretability is typically measured indirectly in terms of complexity (for instance, number of fuzzy sets, like in Gonzalez et al. (2007), and/or number of rules) or in terms of properties of the partitions, such as coverage and distinguishability. Finally, when these properties are considered, they are measured by a simple similarity index that cannot completely capture interpretability semantics (Jimenez et al. 2001; Wang et al. 2005).

In this paper, we introduce a novel interpretability index that, exploiting a fuzzy ordering relation, explicitly takes coverage and distinguishability into account, in attempt to reproduce the interpretability perceived by humans and described by de Oliveira (1999). Indeed, the definition of our index reflects the following observation: humans associate a semantic ordering with the linguistic terms used as values of a linguistic variable. This ordering has, by and large, universal acceptance and has to be observed by the fuzzy sets used to define the meaning of the linguistic terms employed by the system. A further condition for interpretability is that fuzzy sets should be made distinguishable from each other so as to preserve distinction between linguistic terms. Indeed, humans associate completely different meanings with different linguistic terms, and these differences are more marked for linguistic terms which are semantically far. For instance, the distinction between low and medium is less marked than between low and high. Finally, the universe should be covered, that is, there should not exist members of the universe which are represented by no linguistic term. The index we propose in this paper considers explicitly these three aspects of interpretability.

3 Operators for CA

In this section, we briefly review the scaling functions and the operators that we use to perform CA of an FRBS. Preliminary versions of these operators have been previously discussed by Botta et al. (2006a). We introduce a flexible and compact scaling function and we define four fuzzy modifiers that allow adapting the core, the support and the boundary elements of fuzzy sets.

In the adaptation process, we first apply the scaling function, and then the fuzzy modifiers. The modifiers are chosen and formulated in such a way that the effects on the context-adapted partition are independent of the order in which they are executed. Thus, modifiers can be applied without interfering with each other. Our modifiers are flexible enough to be used in a wide range of tuning applications, not necessarily related to CA, and might be applied to a single fuzzy set. However, in our CA approach, each modifier is applied with the same intensity to the overall partition. This allows us to use a very small number of parameters to represent a wide range of configurations through the combined effects of the five operators to be defined next.

The fuzzy modifiers are formulated so as to act on trapezoidal fuzzy sets defined by (sl,cl,cu,su), where sl and su, and cl and cu are the left and right bounds of the support and of the core, respectively, with sl ≤ cl ≤ cu ≤ su. This definition includes as special cases triangular fuzzy sets (when cl = cu) and singletons (when sl = cl = cu = su). Although we use trapezoidal fuzzy sets, we remark that the proposed modifiers can be easily adapted to work on any other shape of fuzzy sets.

3.1 Scaling function

To cover the overall universe of discourse and to non-uniformly distribute the fuzzy sets in the partition, we adopt the following scaling function:

$$ s(x):[0,1]\rightarrow [a,b] = \left\{ \begin{array}{ll} a+(b-a)(\lambda^{1-k_{\rm SF}} x^{k_{\rm SF}})& \hbox{if } x \le \lambda \\ a+(b-a) [1-(1-\lambda)^{1-k_{\rm SF}} (1-x)^{k_{\rm SF}}] &\hbox{if } x > \lambda, \\ \end{array}\right. $$
(1)

where parameters a and b identify the bounds of the universe of discourse, parameter λ ∈ [0, 1] defines a sort of center of gravity in the normalized partition, and parameter k SF > 0 defines the degree of dilation (k SF > 1) or compression (k SF < 1) of fuzzy sets around λ. For an example of scaling function application, the reader can refer to Botta et al. (2006a).

3.2 Core-position modifier

The core-position modifier acts on the core of a fuzzy set, shifting its position within the support while maintaining the original width. The modifier is defined as:

$$ {\rm cl}^{\prime} = \left\{ \begin{array}{ll} {\rm cl}-({\rm sl}-{\rm cl})\cdot k_{\rm CP} &\hbox{if } k_{\rm CP} < 0 \\ {\rm cl} + ({\rm su}-{\rm cu})\cdot k_{\rm CP} &\hbox{if } k_{\rm CP}\ge 0,\\ \end{array} \right. $$
(2)
$$ {\rm cu}^{\prime} =\left\{ \begin{array}{ll} {\rm cu}-({\rm sl}-{\rm cl})\cdot k_{\rm CP} &\hbox{if } k_{\rm CP} < 0 \\ {\rm cu}+({\rm su}-{\rm cu})\cdot k_{\rm CP} &\hbox{if } k_{\rm CP}\ge 0,\\ \end{array} \right. $$
(3)

where cl′ and cu′ are the left and right bounds of the modified core, respectively, and k CP ∈ [−1, 1] defines the intensity of the left shift (k CP < 0) or right shift (k CP > 0) of the core. Figure 2a shows a sample application of the core-position modifier to a partition P = {A 1,...,A 5} composed by five uniformly distributed trapezoidal fuzzy sets over the normalized universe [0,1].

Fig. 2
figure 2

Sample applications of the modifiers to a partition composed by five trapezoidal fuzzy sets

3.3 Core-width modifier

The core-width modifier acts on the core of a fuzzy set, dilating or shrinking the core within the support. The modifier is defined as:

$$ {\rm cl}^{\prime} =\left\{ \begin{array}{ll} {\rm cl}+ w\cdot ({\rm sl}-{\rm cl})\cdot k_{\rm CW} &\hbox{if } k_{\rm CW} < 0 \\ {\rm cl} + ({\rm sl}-{\rm cl})\cdot k_{\rm CW} &\hbox{if } k_{\rm CW}\ge 0,\\ \end{array} \right. $$
(4)
$$ {\rm cu}^{\prime} = \left\{ \begin{array}{ll} {\rm cu}+w\cdot ({\rm su}-{\rm cu})\cdot k_{\rm CW} &\hbox{if }\ k_{\rm CW} < 0 \\ {\rm cu}+({\rm su}-{\rm cu})\cdot k_{\rm CW} &\hbox{if }\ k_{\rm CW}\ge 0,\\ \end{array}\right. $$
(5)

where w = (cu−cl)/(cl−sl + su−cu), and k CW ∈ [−1,1] determines the intensity of dilation (k CW > 0) or shrinking (k CW < 0) of the core. Figure 2b shows a sample application of the core-width modifier.

3.4 Support-width modifier

The support-width modifier acts on both the support and the core of a fuzzy set, scaling their widths with respect to the center of the support and preserving the ratio between the widths of the core and the support. The modifier is defined as:

$$ {\rm sl}^{\prime} = {\rm sm} + k_{\rm SW}\cdot ({\rm sl}-{\rm sm}), $$
(6)
$$ {\rm cl}^{\prime} = {\rm sm} + k_{\rm SW}\cdot ({\rm cl}-{\rm sm}), $$
(7)
$$ {\rm cu}^{\prime} = {\rm sm} + k_{\rm SW}\cdot ({\rm cu}-{\rm sm}), $$
(8)
$$ {\rm su}^{\prime} = {\rm sm} + k_{\rm SW}\cdot ({\rm su}-{\rm sm}), $$
(9)

where sl′ and su′ are the left and right bounds of the modified support, respectively, sm = (sl + su)/2, and k SW > 0 determines the negative (k SW < 1) or positive (k SW > 1) scaling of the fuzzy set. Figure 2c shows a sample application of the support-width modifier.

3.5 Generalized positively modifier

The generalized positively modifier is an extension of the well-known positively linguistic hedge. We use this modifier to change the degree of membership of the boundary elements of the fuzzy sets. In other words, the generalized positively modifier changes the shape of the original trapezoidal fuzzy sets so as to generate, for instance, Gaussian-like fuzzy sets. The modifier is defined as:

$$ A^{\prime}(x)=\left\{ \begin{array}{ll} \theta^{1-k_{\rm GP}}A^{k_{\rm GP}}(x)&\hbox{if } A(x) < \theta \\ 1-(1- \theta)^{1-k_{\rm GP}}[1-A(x)]^{k_{\rm GP}} & \hbox{if } A(x)\ge\theta,\\ \end{array}\right. $$
(10)

where A′ is the fuzzy set resulting from the application of the linguistic hedge to A, θ ∈ [0,1] is the degree of membership in which the fuzzy set changes concavity, and k GP > 0 is a contrast intensification parameter. Figure 2d shows a sample application of the generalized positively modifier.

4 An interpretability index based on fuzzy ordering relations

4.1 Fuzzy sets ordering

As stated in Section 1, CA of an FRBS consists in tuning the fuzzy sets corresponding to the linguistic terms used in the RB. The ordering of the context-adapted fuzzy sets should reflect the semantic ordering of the linguistic terms. The ordering of fuzzy sets has been widely addressed in the literature (Wang and Kerre 2001; Cross and Sudkamp 2002). Given two fuzzy sets A 1 and A 2 defined over the universe of discourse \({\mathbb{R}},\) it is sometimes difficult to determine whether and how A 1 and A 2 are ordered. The trivial approach is to consider basic features of each fuzzy set, such as the position of modal values or the lower (upper) bound of the support, and to define the ordering of fuzzy sets based on the ordering of these metrics. This approach performs correctly when fuzzy sets are clearly and intuitively ordered, but it may lead to counter-intuitive results in the general case. Wang and Kerre (2001) reviewed a number of ordering approaches, including some methods that evaluate the ≤ relation as a fuzzy relation. These methods provide us with enough expressibility to assess that A 1 ≤ A 2 and A 2 ≤ A 1 are true at different degrees, for instance 0.6 and 0.2, respectively. In this case, we write ≤ (A 1, A 2) = 0.6 and R ≤ (A 2, A 1) = 0.2. If R ≤ (A 1, A 2) + R ≤ (A 2, A 1) = 1 holds, the relation is called reciprocal.

Figure 3 shows eight case studies of couples of fuzzy sets (case 4 is taken from an example by Wang and Kerre (2001)). The ordering of cases 1–3 is easily identifiable, while the other cases are disputable. For instance, in case 8, if we use the interval set ordering of the supports to assess the ordering between A 1 and A 2, we obtain A 2 A 1; on the other hand, if we adopt the ordering of cores, we obtain A 1 ≤ A 2. We used these case studies to benchmark the behavior of Kolodziejczyk’s index and Yuan’s index (Yuan 1991), two reciprocal fuzzy ordering relations previously reviewed by Wang and Kerre (2001). Table 1 shows the values of the two indices for the eight case studies. We remark that, between the two indices, Yuan’s index is able to distinguish one case from another with higher precision.

Fig. 3
figure 3

Case studies for the evaluation of ordering indices between fuzzy sets A 1 (solid) and A 2 (dotted)

Table 1 Evaluation of A 1 A 2 and A 2 ≤ A 1 on the case studies of Fig. 3

4.2 Interpretability and ordering of fuzzy partitions

Interpretability of fuzzy partitions can be defined in several ways. Here, we refer to the definition proposed by de Oliveira (1999), who states that a fuzzy partition is interpretable if it satisfies the following properties:

  1. 1.

    The partition should have a reasonable number of fuzzy sets;

  2. 2.

    The fuzzy sets in the partition should all be normal, i.e., for each fuzzy set there exists at least one point with membership degree equal to 1;

  3. 3.

    Each couple of fuzzy sets should be distinguishable enough, so that there are no two fuzzy sets that represent pretty much the same concept;

  4. 4.

    The overall universe of discourse should be strictly covered, i.e., each point of the universe should belong to at least a fuzzy set with a membership degree over a given reasonable threshold.

If the number of linguistic terms is low, as it generally is, guideline 1 defined in Section 1 allows satisfying property 1. Furthermore, if the operators used for CA do not alter the normality of the context-adapted fuzzy sets, as, for instance, the ones introduced in Sect. 3, property 2 is verified as well. On the other hand, properties 3 and 4 are not so easily satisfiable. In Botta et al. (2006a) these properties are achieved by heuristically restricting the domain of parameters of operators. These two properties pose interesting challenges to the designer of a learning method for FRBSs. First, defining a proper metric to measure them with low computational effort is difficult. Second, it is hard to find a crisp threshold for the metric so as to separate good from bad partitions.

An approach to address this problem is to consider distinguishability and coverage as conflicting properties, and to use a metric to assess the trade-off between them. For instance, the membership value of crossing points between adjacent fuzzy sets can be used to measure both at a time (de Oliveira 1999). Since in CA we also require to preserve ordering of the original fuzzy partition, we choose to evaluate the interpretability by extending an ordering index. Indeed, it is extremely difficult to derive useful information about ordering from traditional interpretability indices. For instance, by using the membership degrees of crossing points of adjacent fuzzy sets, we can only derive a partial ordering based on α-cuts.

Further, it can be shown that Jaccard’s index (Cross and Sudkamp 2002), which is often used to measure interpretability, does not seem to provide means for salient evaluation of ordering of fuzzy partitions. Some ordering indices, however, can be used to evaluate distinguishability and coverage from an interpretability point-of-view.

Indeed, let us consider again Fig. 3 and Table 1. We observe that in cases 1 and 2 the fuzzy sets satisfy both the properties of distinguishability and coverage, whilst case 3 is lacking coverage property and case 4 is lacking distinguishability property. The Kolodziejczyk’s index returns ≤ (A 1, A 2) = 1 in all four cases and, therefore, it cannot be used to evaluate distinguishability and coverage. In contrast, the Yuan’s index is able to properly discriminate the four cases, giving a crisp value of 1 only in case 3, in which A 1 ∩ A 2 = ∅. In the other cases, different degrees of truth are obtained by the evaluation of Y ≤ (A 1, A 2). Thus, Yuan’s index is a sensible choice for evaluating interpretability.

4.3 The interpretability index

Let us consider a partition P = {A 1,...,A i ,...,A N } consisting of N fuzzy sets. Let d j,i  = |ji| be the semantic distance between A j and A i . For instance, the semantic distance between A 3 and A 1 is 2. We define the following index to evaluate interpretability and ordering of a partition:

$$ \Upphi_{Q}(P) = {\frac{\sum\nolimits_{\tiny\begin{array}{c}1 \le i \le N-1\\ i < j \le N\\ \end{array}} {\frac{1}{d_{j,i}}}\cdot\mu^{d_{j,i}}_Q\left(Q_\le(A_i,A_j)\right)} {\sum\nolimits_{\tiny\begin{array}{c}1 \le i \le N-1\\ i < j \le N\\ \end{array}} {\frac{1}{d_{j,i}}}}}, $$
(11)

where Q is a fuzzy ordering index and \(\mu^{d_{j,i}}_Q(x),\) with xQ ≤ (A i , A j ), are fuzzy sets defined on the universe [0,1] of the values of Q.

The value of Φ Q (P) ranges between 0 (the lowest level of interpretability) and 1 (the highest level of interpretability). Thus, Φ Q (P) should be close to 1 for uniform fuzzy partitions. Fuzzy sets \(\mu^{d_{j,i}}_Q(x)\) are used to assess, for different values of d j,i , the value of the Q index with respect to interpretability. For instance, in case of d j,i  = 1 and A i  ∩ A j = ∅, i.e., a situation similar to case 3 in Fig. 3, adjacent fuzzy sets are not overlapped and, therefore, coverage is not verified. It follows that \(\mu^{d_{j,i}}_Q(x)\) should return a value close to 0. On the other hand, in case of d j,i  > 1 and A i  ∩ A j = ∅, to enforce distinguishability, the fuzzy sets should not be overlapped and, therefore, \(\mu^{d_{j,i}}_Q(x)\) should return a value close to 1. Obviously, the definition of the family of fuzzy sets \(\mu^{d_{j,i}}_Q(x)\) is a critical step, since these fuzzy sets represent the actual link between the evaluation of the ordering, coverage and distinguishability properties. In the following, we describe a procedure to generate the family of fuzzy sets \(\mu^{d_{j,i}}_Q(x).\)

We start from the evaluation of a well-known measure of interpretability, based on the value y of the crossing point between two fuzzy sets (denoted as XP in the following). We can distinguish two cases:

  • d j,i  = 1 (Semantically adjacent fuzzy sets): In this case, y should be neither too close to 1 nor to 0, so as to preserve, respectively, the distinguishability and the coverage properties. On the other hand, the two properties are both verified when y is close to 0.5. These observations can be modeled by the fuzzy set ν 1 XP (y) shown in solid line in Fig. 4a;

  • d j,i  > 1 (Semantically non-adjacent fuzzy sets): In this case, we do not care about coverage, since it is already ensured by adjacent fuzzy sets, but we stress distinguishability. Thus, y should be close to 0. Nevertheless, depending on the actual value of d j,i , we can still tolerate some overlapping between the two fuzzy sets, and this tolerance should decrease with the increase of d j,i . These observations can be modeled by the fuzzy set \(\nu^{d_{j,i}}_{XP}(y)\) shown in dotted line in Fig. 4a for different values of d j,i . We note that, while \(\nu^{d_{j,i}}_{XP}(0) = 1\ \forall d_{j,i} > 1\) , the right spread shrinks toward 0 as d j,i increases, so as to reduce the tolerance.

Fig. 4
figure 4

Fuzzy sets \(\nu^{d_{j,i}}_{XP}(y)\) and \(\mu^{d_{j,i}}_Y(x)\) used in the examples of Sect. 6

To obtain \(\mu^{d_{j,i}}_Q(x),\) we project \(\nu^{d_{j,i}}_{XP}(y)\) from its original universe of discourse y to the universe of discourse x. The overall process can be formalized as follows:

  1. 1.

    We choose \(\nu^{d_{j,i}}_{XP}(y)\) as triangular membership functions, defined by the three breakpoints \((sl^{d_{j,i}},c^{d_{j,i}},\) \(su^{d_{j,i}}).\) We set \(sl^{d_{j,i}} = 0\; \forall d_{j,i},\,\,c^{1} = 0.5,\,\,c^{d_{j,i}} = 0\; \forall d_{j,i} > 1,\,\,su^{1} = 1,\) and \(su^{d_{j,i}} = 2/d_{j,i} \;\forall d_{j,i} > 1.\) Figure 4a shows \(\nu^{d_{j,i}}_{XP}(y),\) with N = 5 and d j,i = 1...4;

  2. 2.

    We empirically identify a relation R Q,XP (x,y) by evaluating x and y for a number of differently overlapping trapezoidal membership functions;

  3. 3.

    Using the extension principle and R Q, XP (x, y), we project the fuzzy sets \(\nu^{d_{j,i}}_{XP}(y)\) from the universe of discourse y to the universe of discourse x, thus obtaining a corresponding \(\mu^{d_{j,i}}_Q(x).\) More formally, we have:

    $$ \mu^{d_{j,i}}_Q(x)=\sup_{y\in [0,1]}\min(\nu^{d_{j,i}}_{XP}(y),R_{Q,XP}(x,y)), $$
    (12)

    where d j,i = 1...N−1.

We illustrate the above process by detailing the sets \(\mu^{d_{j,i}}_Y(x)\) corresponding to Yuan’s fuzzy ordering index. We recall the formula to compute Yuan’s index (Yuan 1991; Wang and Kerre 2001):

$$ Y \le (A_1,A_2) = \Updelta_{A_2,A_1}/\left(\Updelta_{A_1,A_2}+\Updelta_{A_2,A_1}\right), $$
(13)

where \(\Updelta_{A_1,A_2}\) is defined as:

$$ \Updelta_{A_1,A_2} = \int\limits_{\alpha|u_{A_{1\alpha}} > l_{A_{2\alpha}}} (u_{A_{1\alpha}}-l_{A_{2\alpha}})d\alpha + \int\limits_{{\alpha|l_{A_{1\alpha}} > u_{A_{2\alpha}}}}(l_{A_{1\alpha}}-u_{A_{2\alpha}})d\alpha, $$

α ∈ [0,1] is an α-cut value, A i α is the crisp set obtained by α-cutting A i , i = {1,2}, and \(l_{A_{i\alpha}}\) and \(u_{A_{i\alpha}}\) are the lower and upper bounds of A iα, respectively. Figure 4b shows the projections \(\mu^{d_{j,i}}_Y(x)\) obtained by applying the overall process. This family of fuzzy sets can then be employed in the corresponding interpretability index Φ Y (P). In Fig. 5, we show the evaluation of the proposed index on four sample partitions characterized by different degrees of coverage, distinguishability and ordering of fuzzy sets.

Fig. 5
figure 5

Evaluation of Φ Y on sample partitions with different degrees of coverage, distinguishability and ordering of fuzzy sets

5 The MOEA

Thanks to their modeling capabilities, the scaling function and the fuzzy modifiers introduced in Sect. 3 allow adapting the normalized partitions to any context. In particular, we apply a MOEA to perform the CA described in Sect. 1. To this aim, we need to determine which operators should be applied and the values of their parameters. These choices are typically based on the maximization of the accuracy of the FRBS on real-world examples which describe the behavior of the context-adapted system. Furthermore, as stated in Sects. 1 and 4, the partition generated by the application of the operators should satisfy the ordering and interpretability constraints. Our objectives are accuracy, measured by MSE, and interpretability, measured by the average \(\bar{\Upphi}_Y\) of the values of the index Φ Y computed for all input and output partitions of the FRBS. These conflicting goals are balanced by the adoption of NSGA-II (Deb et al. 2002).

NSGA-II is a fast and elitist MOEA based on a non-dominance rank assignment and an ad-hoc density-estimation metric. In our CA approach based on scaling functions and fuzzy modifiers, individuals represent the values of the parameters which define the operators used to adapt a Mamdani-type FRBS.

Assume that the FRBS has V−1 input variables and one output. Let \(S = {(\hat{\vec{x}}_j,\hat{y}_j)}\) be a set of M real-world examples, where \(\hat{\vec{x}}_j\) is a vector of V−1 input values, and \(\hat{y}_j\) is the corresponding output. Each individual in the population is represented by a chromosome composed of V strings of 77 bits, where each string codes the five control genes and the nine 8-bit parameters of the tuning operators (the first V−1 strings determine the parameters for the input variables and the last string for the output variable). Each string is characterized by a hierarchical structure: the first five bits, one for each operator, control whether the corresponding operator is applied or not on each fuzzy partition. The other 72 bits are organized in sub-strings of 8 bits: each sub-string determines the value of a different parameter.

Instead of using a mixed binary-real variable coding (with five binary genes for the choice of the tuning operators and nine real genes for the parameter values) we decided to adopt a binary-coded GA based on the following considerations. We observed that the quantization performed by binary coding does not affect the precision of the choice of the parameter values due to the small ranges of these parameters. Furthermore, binary coding provides a discretization of the search space thus allowing exploring the solution space with lower computational effort. However, binary coding suffers from the following problem: mating can generate descendants which inherit no characteristics of the parents. To solve this problem, we have adopted, as usual when using binary chromosomes, the Gray decoding to generate individuals from chromosomes (Michalewicz 1999). The chromosome has the following structure:

$$ \begin{aligned} (&C_{\rm SF}^1,C_{\rm CP}^1,C_{\rm CW}^1,C_{\rm SW}^1,C_{\rm GP}^1, \\ &a^1,b^1,\lambda^1,k_{\rm SF}^1,k_{\rm CP}^1,k_{\rm CW}^1,k_{\rm SW}^1,\theta^1,k_{\rm GP}^1, \\ &\qquad \qquad\quad\ldots \\ &C_{\rm SF}^V,C_{\rm CP}^V,C_{\rm CW}^V,C_{\rm SW}^V,C_{\rm GP}^V, \cr &a^V,b^V,\lambda^V,k_{\rm SF}^V,k_{\rm CP}^V,k_{\rm CW}^V,k_{\rm SW}^V,\theta^V,k_{\rm GP}^V), \\ \end{aligned} $$

where C vSF , C vCP , C vCW , C vSW , and C vGP , with v = 1,...,V, are the control genes that determine whether, respectively, the non-linear scaling function of Eq. 1, the core-position modifier, the core-width modifier, the support-width modifier and the generalized positively modifier have to be applied to the partition. The a v,...,k v SF and k v CP,...,k vGP are, respectively, the values of the parameters of the scaling function and of the four fuzzy modifiers for the vth partition.

To the aim of covering the overall universe of discourse, even if C vSF = 0, we perform a linear scaling from the normalized universe [0,1], in which the initial uniform partition is defined, to the context-adapted universe [a v,b v]. Some problems about the normality of the fuzzy partition may arise when both the core-position and the core-width modifiers are applied, because these modifiers might move the core of the first and/or the last fuzzy set out of the bounds of the universe of discourse, thus leaving some subnormal fuzzy sets in the partition. To avoid this problem, once we have applied the modifiers, we adjust the bounds of the universe of discourse so as to include the upper and lower bounds of the cores of the first and last fuzzy sets, respectively.

We start with an initial population composed of randomly generated individuals. At each generation, the uniform crossover and the uniform mutation operators are applied (Michalewicz 1999), with probability of 0.8 and 0.05, respectively. Chromosomes to be mated are chosen by the standard binary tournament proposed by Deb et al. (2002) in the original version of NSGA-II.

6 Experimental results

To evaluate the effectiveness of our approach, we applied the proposed CA method to a regression problem and a data modeling problem. In the first experiment, contexts are modeled by similar curves generated from a parametric function. In the second experiment, we determine the fuel efficiency of a set of vehicles in the contexts of city and highway traffic conditions.

6.1 Parametric function

Let us consider the following parametric function:

$$ \begin{aligned} g(x_1,x_2) = &\kappa + e^{-(\kappa\cdot x_1)^{2\cdot\kappa}-(1+x_2)^{2\cdot\kappa}} \\ &- e^{-x_1^{2\cdot\kappa}-x_2^{2\cdot\kappa}} - e^{-(1+x_1)^{2\cdot\kappa}-(\kappa\cdot x_2)^{2\cdot\kappa}}, \end{aligned} $$
(14)

where x 1,x 2 ∈ [−1.5, 0.5]. We evaluated the function with κ in {2, 5, 7}. The range of the output variable and the smoothness of each curve display similar characteristics for different values of parameter κ. Hence, we can define a set of rules that linguistically describe these common features, and consider each of the three curves as a different instantiation of the same generic shape in a different context, determined by the value of κ. We use the intuitive set of rules shown in Table 2. In the rule base, four linguistic labels are used in each universe, namely L (low), ML (medium–low), MH (medium–high) and H (high). For each of the three instances, we applied the CA approach introduced in the previous section. We evaluated the curves in a grid of 120 equally spaced points chosen in the [−1.5, 0.5] × [−1.5, 0.5] region. Since there are four linguistic labels for each variable in the RB, the normalized universes of discourse were uniformly partitioned into four trapezoidal fuzzy sets. To evaluate the robustness of MOEA in different runs, we adopted a fivefold cross-validation. Figure 6 shows the Pareto fronts obtained by MOEA on each of the five folds for the context κ = 2, both for the training and the test sets. We observe that the five fronts are quite wide and well distributed. Further, the fronts are quite close to each other on the training set, thus highlighting that the fronts do not depend on the particular execution of MOEA. Due to lack of space, we do not show the Pareto fronts obtained for the other contexts. However, we remark that similar trends can be observed in the other contexts.

Table 2 RB for the parametric function data set
Fig. 6
figure 6

Pareto fronts obtained on the training set (top) and on the test set (bottom) for the parametric function data set on context κ = 2

To assess the goodness of the Pareto fronts, we have also applied the following CA methods as comparison:

  • The non-linear scaling function proposed by Cordón et al. (2001), optimizing its parameters by a single-objective genetic algorithm (SOGA) with binary representation of parameters in the chromosome, uniform mutation, uniform crossover and binary tournament selection. We refer to this approach as SOGA1. We remark that we use only the scaling function and not the overall methodology proposed by Cordón et al. (2001) which, unlike our approach, does not rely on a universally valid RB, but rather identifies rules by exploiting a quick ad-hoc learn-by-example method.

  • The non-linear scaling function and the four fuzzy modifiers introduced in Section 3, optimizing their parameters by the same SOGA as in the previous item. We refer to this method as SOGA2.

In both SOGAs, we adopted the same crossover and mutation probabilities as in NSGA-II. Further, to guarantee a fair comparison, we used the same chromosome coding and genetic operators for all the approaches. Finally, we employed the MSE as the single-objective fitness function. All the approaches were tested with a population of N pop = 50 individuals and a maximum number of generations I max = 200. The experiments on SOGA1 and SOGA2 were repeated five times for each context and for each fold to obtain statistically meaningful results.

Table 3 shows the comparison among the results, expressed as average ± standard deviation, achieved by the three techniques considered. For MOEA, we selected two FRBSs: one, denoted as MOEA a , with the lowest MSE on the test set, and the other, denoted as MOEA b , with the lowest MSE among the solutions dominating the FRBSs determined by SOGA1 on the test set.

Table 3 Results for the parametric function data set

As expected, on the test set the context-adapted FRBSs generated by SOGA2 are characterized by a low MSE, comparable, however, to the best MSEs obtained by the FRBSs in the Pareto fronts. Nevertheless, their interpretability is poor, since the only objective of SOGA2 is to minimize the MSE. SOGA1 achieves values of MSE higher than SOGA2, but generates more interpretable FRBSs, since it is a scaling function-based approach and, therefore, introduces a lower distortion in the fuzzy partitions than SOGA2.

MOEA provides the decision maker with a set of FRBSs with different trade-offs between accuracy and interpretability. In particular, MOEA a and MOEA b achieve an MSE equal to or lower than SOGA1 and SOGA2 on the test set, and are characterized by higher values of \(\bar{\Upphi}_Y.\) Although SOGA1 performs better than MOEA on the training set, MOEA outperforms it on test set. Indeed, due to its simplicity, SOGA1 can perform a deeper exploration of the search space than MOEA, but can also easily incur overfitting problems. We recall that we used the same number of generations for both SOGAs and MOEA. On the other hand, MOEA balances MSE with interpretability and, therefore, is less prone to overfitting than SOGA1. Further, we observe that, as expected, solutions generated by SOGA2 actually lie on a hypothetical extension of the Pareto front in a zone of low interpretability.

Figure 7 shows, for the context corresponding to κ = 2, an example of the fuzzy partitions of the input and output variables of the FRBSs generated by SOGA1 (Fig. 7a) and SOGA2 (Fig. 7b), and the fuzzy partitions of MOEA a which, in this context, corresponds also to MOEA b (Fig. 7c). We note that the FRBS generated by SOGA2 lacks coverage on y, whereas MOEA a shows a high interpretability degree and a low MSE.

Fig. 7
figure 7

Partitions generated for the parametric function data set on context κ = 2

6.2 Fuel efficiency

The 2004 new car and truck data set (Johnson 2004) contains the features of a set of 428 different models of cars and trucks, such as engine size (ES), horsepower (HP), retail price (RP) and fuel efficiency (FE), in city and highway traffic conditions. We preprocessed the data set by selecting the 387 vehicles with the complete set of 19 features. We aimed to model the effects of the traffic conditions on FE with respect to the other features. By computing the correlation between FE and the other features, we realized that only ES, HP, RP and the ratio AW between base area and weight are strongly correlated to FE (actually, AW was purposely generated by combining three features, namely width, length and weight). Hence, we decided to use only ES, HP, AW and RP as input variables and FE as output variable of our model. City and highway traffic conditions were considered as two different contexts. ES, HP, AW, RP and FE are measured in liters, horsepower, inches2/pounds, dollars and miles per gallon (MPG), respectively.

The rules of the FRBS, shown in Table 4, were extracted from the following intuitive considerations derived from experience: FE decreases with the increase of ES, HP and RP, and increases with the increase of AW. Further, we did not generate rules for meaningless or incompatible cases, such as high ES and low HP, or low ES and high RP. Thus, we uniformly partitioned the normalized input variables and the output variable into three fuzzy sets, namely L (low), M (medium), and H (high). The number of fuzzy sets was chosen by interviewing a pool of experts and asking them for a meaningful partition of the universes. Again, a fivefold cross-validation was performed, with N pop = 50 and I max = 200. Figure 8 shows the Pareto fronts obtained for each fold and for both the training and test sets on the city context (in the highway context, we observed similar fronts). Table 5 summarizes the results obtained by comparing MOEA, SOGA1 and SOGA2 as described in Sect. 6.1.

Table 4 RB for the fuel efficiency data set
Table 5 Results for the fuel efficiency data set
Fig. 8
figure 8

Pareto fronts obtained on the training set (top) and on the test set (bottom) for the fuel efficiency data set in the city context

Figure 8 and Table 5 confirm the trend that we observed on the other data set. Indeed, the FRBSs determined by SOGA2 are characterized by a low MSE and a poor interpretability, while SOGA1 generates FRBSs with good trade-offs between accuracy and interpretability that are, however, Pareto-dominated by some solutions found by our MOEA. Further, on this dataset, the FRBSs identified by NSGA-II generalize much better than FRBSs generated by SOGA1 and SOGA2, since they achieve similar MSEs on the training and test sets. This behavior can be explained by the set of CA operators adopted in our approach, which guarantees a higher modeling capability than the scaling function used in SOGA1.

Figure 9 shows, for the city context, an example of the fuzzy partitions of the input and output variables of the FRBSs chosen as in Sect. 6.1. We note that the partitions of the FRBS generated by SOGA2, which outperforms the other in terms of accuracy on the training set, have interpretability difficulties. In particular, distinguishability among different fuzzy sets is not evident on the ES and the PR inputs. In contrast, MOEA a achieves the lowest MSE on the test set and it maintains its interpretability, even if the distinguishability of fuzzy sets on the PR input is not completely evident. Finally, MOEA b (Fig. 9d) outperforms all the other FRBSs in terms of interpretability and, thanks to the modeling power of fuzzy modifiers, achieves an MSE lower than SOGA1.

Fig. 9
figure 9

Partitions generated for the fuel efficiency data set on the city context

7 Conclusion

In this paper, we have proposed a MOEA-based approach to context adaptation of Mamdani-type FRBSs. MOEA generates a Pareto front composed of context-adapted FRBSs with different trade-offs between accuracy and interpretability. Accuracy is measured in terms of mean square error. Interpretability is evaluated using a novel index based on a fuzzy ordering relation. This index can be employed in any constrained tuning of an FRBS where ordering and interpretability of fuzzy partitions are required. The results obtained by our approach using a Yuan-based interpretability index on synthetic and real data sets have shown that the MOEA-based CA can determine solutions that achieve an error equal to or better than accuracy-oriented approaches, while, at the same time, preserving interpretability of the fuzzy partitions.