1 Introduction

The modeling of fuzzy systems has gone through remarkable success over the last fifty years. They have proved their capacity in different application areas, such as pattern recognition (Boutleux and Dubuisson 1996), classification (Jarraya et al. 2015; Jahromi and Moosavi 2011), control problems (Tanaka and Sano 1994; Singhala et al. 2014), data mining problems (Hüllermeier 2005) and time series forecasting (Jarraya et al. 2013, 2014).

In recent few years, type-2 fuzzy design (Zadeh 1975) has become a growing and an active research topic in the focus of several researchers. The main cause of the immigration from type-1 fuzzy systems to type-2 fuzzy systems is due essentially to the nature of the knowledge employed to produce fuzzy rules which are usually too uncertain and contain incomplete or inaccurate information. In fact, type-1 fuzzy sets are precise and unable to handle high levels of uncertainties. In this sense, several research studies have proved that type-2 fuzzy approaches have more potential than their type-1 counterpart in coping with uncertainties.

However, as the complexity and the dimensionality of the application problems increase, the number of fuzzy rules of a standard fuzzy system (type-1 or type-2) also increases which badly affects the interpretability of the resulting rules. That is why, standard fuzzy systems usually suffer from the “curse of dimensionality” or the “combinatorial rule explosion” problem when they are applied to complex problems. As a solution, the hierarchical fuzzy modeling was proposed by Raju and Zhou (1993) as an effective alternative to solve this problem by arranging the inputs in a hierarchical architecture. In this case, instead of using a standard or a flat high-dimensional fuzzy system, a number of lower-dimensional sub-fuzzy models are linked in a hierarchical way. Consequently, the number of rules is reduced and the approximation abilities of the system are improved.

Since Raju’s paper on hierarchical fuzzy systems (HFSs) (Raju and Zhou 1993), several works have appeared in that area applying machine learning and optimization techniques to help in the process of building type-1 HFSs. Indeed, the search for an optimal hierarchical structure is as important as the search for the optimal set of parameters. Therefore, hierarchical fuzzy modeling has been considered as a search problem and as an optimization task in both structure and parameter spaces. In this regard, several research works have been proposed in the literature to construct or to learn these systems. For example, the authors suggested in Shimojima et al. (1995) the use of the genetic algorithm (GA) for the optimization of the HFS’ structure and the gradient descent algorithm to adjust the parameters. In Lin and Lee (1999), the authors proposed for the control low-speed problem, a method based on GA in order to optimize the parameters and the structure of five inputs hierarchical fuzzy controller. Moreover, an approach was introduced in Balazs et al. (2010) based on two structures evolving algorithms which are the genetic and the bacterial programming algorithms in order to build a multi-level rule-based system. Salgado proposed in Salgado (2008), a hierarchical collaborative structure (HCS) where three phases of structure building, parameter identification and data division were employed. In Cheong and Lai (2007), the authors applied the differential evolution (DE) algorithm for the automatic design of a hierarchical fuzzy logic controller. In addition, Chen et al. (2004) proposed a method that interleaves the ant programming (AP) and the particle swarm optimization (PSO) algorithms for, respectively, the structure learning and the rules’ parameters tuning of a TSK HFS. Chen et al. ameliorated this version and proposed in Chen et al. (2007) a structure learning step using the probabilistic incremental program evolution (PIPE) algorithm and applied the evolutionary programming algorithm (EP) for adjusting the rules’ parameters of the hierarchical system. In Fernández et al. (2009), the authors suggested a learning process of a hierarchical fuzzy rule base classification system (HFRBCS), where the knowledge base was created by means of a linguistic rule generation (LRG) method and the best rules were selected from the hierarchical rule base using the GA.

Nevertheless, all of these HFSs proposals employed type-1 fuzzy models in the hierarchical design and focused on improving only the system’s accuracy using mono-objective hybrid optimization techniques. However, as far as we know, very few publications can be found exploiting multi-objective evolutionary algorithms (MOEAs) for evolving HFSs and/or addressing type-2 fuzzy systems in the hierarchical design (Hagras 2004; Benítez and Casillas 2013; Ojha et al. 2017). In fact, Hagras (2004) suggested a type-2 hierarchical fuzzy logic controller (type-2 HFLC) for autonomous mobile robots. In Benítez and Casillas (2013), a multi-objective genetic algorithm was proposed for the learning of an incremental HFS with the aim of minimizing two objectives, which are the error and the number of rules. In Ojha et al. (2017), the authors used a multi-objective genetic programming (MOGP) algorithm to train the structure of a proposed type-2 hierarchical fuzzy inference tree (T2HFIT), while the differential evolution (DE) algorithm was used to fit the parameters of the model. In this approach, both the accuracy and complexity of the system were used as objectives to reach.

In this work, a type-2 hierarchical flexible beta fuzzy system (T2HFBFS) is proposed. For the design process, the T2HFBFS is presented through a tree encoding method, and its membership functions (MFs) are modeled based on interval type-2 beta fuzzy sets. The use of type-2 beta fuzzy sets enables the system to handle uncertain information more efficiently than its type-1 counterpart. Regarding the optimization process, two main phases of structure learning and parameter optimization are introduced. The structure learning phase is performed based on the multi-objective extended immune programming (MOEIP) algorithm. This step aims to learn a population of T2HFBFS structures taking into account the optimization of both the accuracy and the interpretability objectives. For the tuning phase, the PSO-based update memory for improved harmony search algorithm (PSOUM-IHS) (Ammar et al. 2013) is applied to update the parameters of the antecedent and consequent parts of fuzzy rules. By interleaving the two optimization phases, a high-performance T2HFBFS is finally generated.

The rest of the paper is planned as follows: In Sect. 2, the type-2 hierarchical flexible beta fuzzy system is proposed. Section 3 introduces the initialization phase based on a clustering technique. Next, the multi-objective structure learning and the parameter optimization phases are explained in, respectively, Sects. 4 and 5. A global description about the hybrid evolving algorithm is then given in Sect. 6. Simulation studies are next depicted in Sect. 7. And finally, in Sect. 8, some conclusion remarks are given.

2 The type-2 hierarchical flexible beta fuzzy system

2.1 Type-1 beta membership function (T1 BMF)

The type-1 beta fuzzy set was proposed by Alimi (1997). In the one-dimensional case, the type-1 beta membership function (T1 BMF) is expressed by:

$$ \beta \left( {x;\;c,\;\sigma , \;p, \;q} \right) = \left\{ {\begin{array}{*{20}l} {\left[ {1 + \frac{{\left( {p + q} \right)\left( {x - c} \right)}}{\sigma p}} \right]^{p} \times } \hfill & {} \hfill \\ {\left[ {1 - \frac{{\left( {p + q} \right)\left( {c - x} \right)}}{\sigma q}} \right]^{q} if x \in \left] {c - \frac{\sigma p}{p + q}, c + \frac{\sigma p}{p + q}} \right[} \hfill & {} \hfill \\ 0 \hfill & {\text{elsewhere }} \hfill \\ \end{array} } \right. $$
(1)

where c is the beta function center, \( \sigma \) is the width, and \( p \) and \( q \) present the form parameters, \( p,q > 0 \).

The beta function has a universal approximation characteristic and can approximate other functions like triangular or Gaussian functions (Alimi 2003). Indeed, in comparison with the Gaussian function which relies on the center and the width parameters, the beta function relies on two additional form parameters (\( p \) and \( q \)) which allow a greater flexibility in the modeling of fuzzy sets. Alimi (2003) demonstrated that the beta function has the capacity to approximate the Gaussian function and that the reverse is not true. Moreover, this function presents other advantages like its high flexibility and its ability to generate richer shapes (asymmetry, linearity, etc.) (Alimi 1997, 2000). Figure 1 presents some examples of type-1 beta MFs with different shapes.

Fig. 1
figure 1

T1 BMFs with different shapes

2.2 Interval type-2 beta membership function (IT2 BMF)

A type-2 fuzzy set \( \tilde{A} \) (T2 FS) in the universe of discourse U is represented by a type-2 membership function \( \mu_{{\tilde{A}}} \), expressed as follows (Mendel and John 2002):

$$ \tilde{A} = \mathop \smallint \limits_{x \in U} \mathop \smallint \limits_{{u \in J_{x} }} \mu_{{\tilde{A}}} \left( {x, u} \right)/\left( {x,u} \right) $$
(2)

where \( \smallint \smallint \) defines the union over all admissible x and u, \( J_{x} \) denotes an interval in [0, 1] and \( \mu_{{\tilde{A}}} \left( {x, u} \right) \) presents a type-1 fuzzy set known as the secondary set with \( 0 \le \mu_{{\tilde{A}}} \left( {x, u} \right) \le 1 \). When all \( \mu_{{\tilde{A}}} \left( {x, u} \right) = 1 \), then \( \tilde{A} \) is considered as a special case of a type-2 fuzzy set called an interval type-2 fuzzy set (IT2 FS) and is expressed as follows:

$$ \tilde{A} = \mathop \smallint \limits_{x \in U} \mathop \smallint \limits_{{u \in J_{x} }} 1/\left( {x,u} \right) $$
(3)

where \( J_{x} \subset \left[ {0,1} \right]. \) Note that an IT2 FS is characterized by a bounded area FOU known as the footprint of uncertainty. This area is delimited by two type-1 fuzzy sets called upper MF (UMF), \( \bar{\mu }_{{\tilde{A}}} \left( x \right), \) and lower MF (LMF),\( \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\mu }_{{\tilde{A}}} \left( x \right), \), respectively.

In this study, instead of employing Gaussian or triangular interval type-2 MFs which are frequently used in the literature, we choose to adopt the beta function for its large flexibility as compared with the other functions. Therefore, the interval type-2 beta membership function (IT2 BMF) is proposed and employed in this work. The IT2 BMF is a beta primary MF with a fixed center c, and uncertain parameters \( \sigma \), \( p \) and \( q \). The IT2 BMF is expressed by:

$$ \left\{ {\begin{array}{*{20}l} {\beta \left( {x;c,\sigma , p, q} \right) = \left[ {1 + \frac{{\left( {p + q} \right)\left( {x - c} \right)}}{\sigma p}} \right]^{p} \left[ {1 - \frac{{\left( {p + q} \right)\left( {c - x} \right)}}{\sigma q}} \right]^{q} } \hfill \\ {\sigma \in \left[ {\sigma_{L} , \sigma_{U} } \right], p \in \left[ {p_{L} , p_{U} } \right] \;{\text{and}}\; q \in \left[ {q_{L} ,q_{U} } \right]} \hfill \\ \end{array} } \right. $$
(4)

where \( \sigma_{L} , \sigma_{U} , p_{L} , p_{U} , q_{L} and q_{U} \) present positive real values. Figure 2 shows two examples of IT2 BMFs with uncertain \( \sigma \), \( p \) and \( q \). Upper and lower beta MFs are, respectively, expressed by the following equations:

Fig. 2
figure 2

IT2 BMFs with uncertain \( \sigma \), p and q having different FOUs

$$ \bar{\mu }_{{\tilde{A}}} \left( x \right) = \beta \left( {x;c,\sigma_{U} , p_{U} , q_{U} } \right) $$
(5)
$$ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\mu }_{{\tilde{A}}} \left( x \right) = \beta \left( {x;c,\sigma_{L} , p_{L} , q_{L} } \right) $$
(6)

2.3 Type-2 hierarchical flexible beta fuzzy system (T2HFBFS)

In general, building a hierarchical fuzzy model is considered as a difficult task. This is because we have to define the number of hierarchical levels, the rule base of each module (sub-fuzzy model), the number of modules in intermediate levels, how to arrange the original input variables in the different levels and so on.

In this study, the design of an accurate type-2 hierarchical fuzzy system having a simple structure is considered as a search problem in both architecture and parameter spaces. For that, firstly, the multi-level aspect is illustrated by employing a tree-based encoding scheme. In fact, the tree representation can provide more adjustable and modifiable structures due to its natural and flexible hierarchical representation. In addition, the presented hierarchical tree design is proposed for type-2 beta fuzzy systems. In this case, the resulted system is termed the type-2 hierarchical flexible beta fuzzy system (T2HFBFS). The proposed system is characterized by the following node set S:

$$ S = N \cup T = \left\{ { {\text{BFII}}_{kj}^{l} / k \in \left\{ {2, \ldots .,K} \right\}, j \in \left\{ {1, \ldots ,J} \right\},l \in \left\{ {1, \ldots ,\left( {L - 1} \right)} \right\}} \right\} \cup \left\{ {x_{1} , \ldots , x_{M} } \right\} $$
(7)

where:

  • \( N \) is the non-terminal node set formed by a number of type-2 beta sub-fuzzy models \( {\text{BFII}}_{kj}^{l} \). Each \( {\text{BFII}}_{kj}^{l} \) presents an interval A2-C1 TSK fuzzy model (Liang and Mendel 1999) having one output and a number of fuzzy rules. These rules rely on an antecedent part based on interval type-2 beta membership functions (IT2 BMFs) and a consequent part based on interval type-1 fuzzy sets.

    For a \( {\text{FII}}_{kj}^{l} \), k defines the number of inputs, while K is the maximal degree of the tree. j is the index of the corresponding \( {\text{BFII}}_{kj}^{l} \) with k inputs, while J is the number of times in which \( {\text{BFII}}_{kj}^{l} \) occurs with k inputs. And l is the index of the level, while L presents the number of the tree levels;

  • T is the terminal node set formed by the input variables \( x_{1} \), \( x_{2} \),…,\( x_{M} \);

The evaluation of a T2HFBFS is executed recursively from the lower level to the final one. Each level can contains one or several sub-fuzzy models. The inputs of a level can be the outputs of its lower level, or a combination of some original inputs with its lower-leveled outputs. For further clarification about the evaluation process of a T2HFBFS, Fig. 3 is considered as a simple example of a possible obtained T2HFBFS (Fig. 3b) from a tree structural representation (Fig. 3a). The evaluation process for this example is done as follows:

Fig. 3
figure 3

a An example of a tree representation, b the corresponding T2HFBFS: N = {\( BFII_{21}^{1} \), \( BFII_{22}^{2} \)}, T = {\( x_{1} \), \( x_{3} \), \( x_{4} \)}

Firstly, the \( {\text{BFII}}_{22}^{2} \) of the second level is evaluated. The fuzzy rules format of this system is as follows:

$$ R_{i}^{l = 2} :{\text{If}}\; x_{3} \;{\text{is}}\; \tilde{A}_{1i}^{2} \;{\text{and}}\;x_{4} \;{\text{is}}\;\tilde{A}_{2i}^{2} \;{\text{then}} $$
$$ Y_{i}^{2} = C_{0i}^{2} + C_{1i}^{2} x_{3} + C_{2i}^{2} x_{4} $$
(8)

where \( x_{3} \) and \( x_{4} \) are the original inputs and \( Y_{i}^{2} \) is the output of rule i. \( \tilde{A}_{1i}^{2} \) and \( \tilde{A}_{2i}^{2} \) are, respectively, the antecedent fuzzy sets of variables \( x_{3} \) and \( x_{4} \) modeled by the IT2 BMFs. \( C_{0i}^{2} ,\;C_{1i}^{2} \;{\text{and}}\;C_{2i}^{2} \) are the consequent elements presented by interval type-1 fuzzy sets.

Next, the output of the whole T2HFBFS is evaluated by computing the root node output of \( {\text{BFII}}_{21}^{1} \). This module has two inputs which are y1 (the output of its lower level \( {\text{BFII}}_{22}^{2} \)) and \( x_{1} \). Rules of \( {\text{BFII}}_{21}^{1} \) have the following format:

$$ R_{j}^{l = 1} :{\text{If}}\;y1\;{\text{is}}\;\tilde{A}_{1j}^{1} \;{\text{and}}\;x_{1} \;{\text{is}}\;\tilde{A}_{2j}^{1} \;then $$
$$ Y_{j}^{1} = C_{0j}^{1} + C_{1j}^{1} y1 + C_{2j}^{1} x_{1} $$
(9)

where \( Y_{j}^{1} \) is the output of rule j. \( \tilde{A}_{1j}^{1} \) and \( \tilde{A}_{2j}^{1} \) are, respectively, the antecedent fuzzy sets modeled by the IT2 BMFs. \( C_{0j}^{1} ,\;C_{1j}^{1} \;{\text{and}}\;C_{2j}^{1} \) are the interval type-1 consequent elements. The final output of the whole system is given by y2 (the output of \( {\text{BFII}}_{21}^{1} \)). Note that, for each \( {\text{BFII}}_{kj}^{l} \), its IT2 BMF parameters and rule consequents will be further optimized in the phase of parameter tuning.

3 Initialization step by a clustering method

The first step of optimization is a learning step of a set of T2HFBFS structures in order to get the best structure. But, it is important to note that optimizing the structures of a set of T2HFBFSs having totally random rules and parameters will not contribute to good results and will slow down the process of optimization. Therefore, a first step of T2HFBFS initialization is required before starting the optimization process. In this sense, the subtractive clustering (SC) algorithm is implemented in this work as a first step to automatically extract the initial rule bases of the hierarchy and the initial beta membership functions shapes.

In fact, the subtractive clustering algorithm proposed by Chiu (1994) is an unsupervised algorithm employed to divide a given data set into meaningful subgroups called clusters. By applying this algorithm, the generated centers of clusters will correspond to the membership function centers, and each center of a cluster will be converted into a fuzzy rule. Based on this concept, an initialization mechanism using the SC algorithm is introduced in this study and is described as follows:

After generating a random population of trees, for each tree, terminal nodes of input variables are clustered in order to create the different \( {\text{BFII}}_{kj}^{l} \) modules of the hierarchy. These sub-fuzzy models are automatically generated with initial values of beta MF parameters and with rule bases extracted from data. In consequence, the resulting extracted rules are more adjusted to the input data than they are in fuzzy models generated randomly. Such initialization step by clustering terminal nodes of trees will speed up the whole evolutionary search process. Readers may refer to (Chiu 1994) to get all the details about the SC algorithm.

4 Multi-objective structure learning by MOEIP algorithm

4.1 Basic concepts of EIP for single optimization problem

In Musileket al. (2006), the authors proved that the immune programming (IP) algorithm is more effective than the genetic programming (GP) algorithm: Successful solutions are obtained using a smaller size of population and with less number of generations. In addition, the IP algorithm has a great ability to evolve structures of trees or programs and proves high flexibility to create more adjustable architectures. For that, we choose to employ in this work a modified version of the IP algorithm in the structure learning phase of T2HFBFSs. The employed algorithm is termed the extended immune programming (EIP) algorithm and is described by the following steps:

  1. a.

    Initialization An initial population of T2HFBFSs (antibodies) is randomly created with random structures (random number of levels and random nodes for each level). For each T2HFBFS, the IT2 BMFs parameters of \( {\text{BFII}}_{kj}^{l} \) modules and the fuzzy rules are initialized by the SC algorithm except for the beta form parameters (p and q) which are randomly generated.

    EIP_Itr = 0, EIP_Itr is the generation counter.

  2. b.

    Evaluation An antigen representing the problem to be addressed is introduced. The antigen is compared to all antibodies (NA antibodies), and their affinity fitness Fit(i) with respect to the antigen is determined.

  3. c.

    Cloning in this step, an antibody \( {\text{Ab}}_{i} \) is selected from the actual population and a random number forming the cloning probability Pc is generated. If the \( {\text{Ab}}_{i} \) affinity is higher than Pc, then \( {\text{Ab}}_{i} \) is cloned and introduced in the next population.

  4. d.

    Mutation: if the previous selected high-affinity antibody \( {\text{Ab}}_{i} \) has not performed a cloning step because of the stochastic nature of this step, so it is presented to hypermutation. As shown in Fig. 4, four mutation operators were used in this study:

    Fig. 4
    figure 4

    Examples of the EIP mutation operators: a original T2HFBFS. b Modify one terminal node. c Modify all terminal nodes. d Growing. e Pruning

Modifying one terminal node: choose a terminal node at random and replace it by another terminal node;

Modifying all terminal nodes: all terminal nodes of \( {\text{Ab}}_{i} \) are selected and replaced by other random terminal nodes;

Growing: choose a terminal node at random and replace it by a randomly created sub-fuzzy model.

Pruning: select randomly a non-terminal node (sub-fuzzy model) and replace it with another randomly generated terminal node.

The mutation operators are used based on the method of Chellapilla (1998). This method is described as follows: (i) generate a number N which presents a sample from a Poisson variable generated at random, (ii) choose N operators randomly from the previously proposed mutation operators and (iii) execute consecutively the N operators on the parents and obtain the offspring.

The mutation step is presented in various manners so that a population with a high genetic diversity is obtained. This diversity will help to overcome local optimum.

  1. e.

    Replacement if \( {\text{Ab}}_{i} \) is not selected for mutation or cloning steps, then we generate a new antibody for the next population (using a replacement probability Pr). Consequently, candidate solutions with low affinity fitness are implicitly replaced.

  2. f.

    Iteration population steps c–e presenting EIP operators are repeated until a full novel population is created.

  3. g.

    Iteration algorithm increment the generation counter after the creation of the new population, EIP_Itr = EIP_Itr + 1. The EIP is iteratively executed (steps b–f) until attaining stopping criteria.

This section introduces the EIP algorithm as a single optimization algorithm and the used fitness function reflects only the accuracy of the system. The next subsections will present an extended version of this algorithm showing how we can improve not only the system’s accuracy but also the interpretability objective. In this case, the problem is called multi-objective.

4.2 Multi-objective optimization problem

A minimization multi-objective problem may be formulated as:

$$ {\text{Min}}\;f\left( x \right) = \left[ {f_{1} \left( x \right), \ldots ,f_{k} \left( x \right)} \right] $$
(10)

subject to:

$$ g_{j} \left( x \right) \ge 0 \quad j = 1, \ldots p $$
(11)
$$ h_{j} \left( x \right) = 0\quad j = 1, \ldots q $$
(12)

where \( x = \left( {x_{1} , \ldots ,x_{n} } \right) \in {\mathbb{R}}^{n} \) is a vector of solutions defined on the space of decision variables. \( g_{j} \left( x \right) \) and \( h_{j} \left( x \right) \) are the functions that represent the constraints of the problem. \( k \) presents the number of objective functions, and \( p \) and \( q \) are, respectively, the number of equality and inequality constraints.

Unlike single-objective optimization problems where only one optimal solution is created, multi-objective optimization algorithms create a number of optimal solutions named Pareto optimal solutions or non-dominated solutions. The concept of dominance is presented as follows: a solution \( x \) dominates a solution \( y \) (expressed by \( x \prec y \)) if and only if \( x \) is greater than \( y \) in at least one objective function and \( x \) is not worse than \( y \) in any objective function. \( x \) is named Pareto optimal if \( x \) is not dominated by any other solution of the present population. In the objective space, the set of all Pareto optimal solutions is called the Pareto optimal front.

4.3 Objective functions

Multi-objective fuzzy design requires the consideration of different objective functions in the search process. In this study, both the accuracy and the interpretability metrics are treated in the multi-objective structure learning step. While the accuracy is measured by the root mean squared error (RMSE), the system’s interpretability is defined by the number of used fuzzy rules:

$$ {\text{Objective 1}}: {\text{RMSE}} = \sqrt {\frac{1}{m}\mathop \sum \limits_{j = 1}^{m} \left( {y_{t}^{j} - y_{\text{out}}^{j} } \right)^{2} } $$
(13)

where m defines the number of samples, and \( y_{t}^{j} \) and \( y_{\text{out}}^{j} \) are, respectively, the actual and the calculated outputs.

$$ {\text{Objective}} 2:{\text{Interpretability}} = {\text{NR}} $$
(14)

where NR is the total number of fuzzy rules.

4.4 Multi-objective extended immune programming algorithm (MOEIP)

In this section, a multi-objective version of the EIP algorithm is proposed and called the multi-objective extended immune programming (MOEIP) algorithm. This algorithm is based on an elitist strategy and uses the EIP operators combined with a dominance concept to search for elite solutions. For that, an archive \( A \) formed by a number of non-dominated antibodies is used. This archive forms a secondary population of Pareto optimal solutions. In the MOEIP strategy, each antibody of the population is presented by a T2HFBFS, and as the search progresses, the dominance criterion with the EIP operators is applied to evolve the population toward an optimal Pareto front.

Moreover, in the evolutionary single optimization case, a child is selected over its parent if it has better fitness value. In the proposed MOEIP, the superiority is measured as a dominance relationship, and a child is selected over its parent only if this latter dominates the parent. Consequently, as the search progresses, the different solutions move closer to the Pareto front.

On the other hand, to generate a well-distributed Pareto front with good diversity, a pruning method using the crowding distance measure (applied in non-dominated sorting genetic algorithm II (NSGA II) (Deb et al. 2002)) is implemented. In fact, we calculate for each solution of the front a crowding distance value which presents the distance between this solution and its neighbors in the fitness function space. In each generation, the solutions of the archive \( A \) are sorted according to their values of crowding distance: Solutions with the highest crowding distance values (with best diversity) are maintained and the worst are discarded from the archive.

Suppose that \( x_{i} \) is a Pareto front solution. The crowding distance \( {\text{cd}}\left( {x_{i} } \right) \) of \( x_{i} \) is calculated as follows:

  1. (i)

    The crowding distance of \( x_{i} \) is initialized:\( {\text{cd}}\left( {x_{i} } \right) = 0 \);

  2. (ii)

    For each objective function \( f_{j} \) do:

    • Sort the different non-dominated solutions along the objective function \( f_{j} \);

    • \( {\text{cd}}\left( {x_{i} } \right) = {\text{cd}}\left( {x_{i} } \right) + f_{j} \) (the solution which precedes \( x_{i} \) in the ordered sequence)—\( f_{j} \) (the solution which follows \( x_{i} \) in the ordered sequence);

A pseudo-code of the MOEIP is introduced by Algorithm 1. To summarize, the basic ideas of this algorithm are:

  • The structure learning of a population of T2HFBFSs has taken into consideration the optimization of both the accuracy and the interpretability.

  • Guide the search process through an optimal Pareto front of non-dominated elite solutions.

  • Keep a diverse set of spaced non-dominated solutions using the crowding distance measure.

  • In the mutation step, the replacement of antibodies is done using dominating offspring.

It should be noted that the MOEIP structure learning step terminates when the maximum number of iterations is reached. As a result, a Pareto optimal front of T2HFBFS solutions is generated. In this case, the most suitable structure solution (having a good balance between the accuracy and the interpretability) is selected to undergo a second step of parameter optimization.

figure a

5 Parameter optimization using the hybrid PSOUM-IHS

The harmony search (HS) proposed by Gem et al. (2001) is a well-known evolutionary meta-heuristic music-inspired algorithm. The harmony search was inspired from musical jazz improvisation when a musician (decision variable) plays (create) a note (value) to reach a better state of harmony (global optimum). The HS explores the solution space to attain the optimum value. The main steps of the HS are briefly described as follows:

  • Step 1 Formulate the problem and initialize the parameters.

  • Step 2 Initialize at random the harmony memory.

  • Step 3 Improvise a novel harmony.

  • Step 4 Update of the harmony memory.

  • Step 5 Repeat steps 3 and 4 until a stopping criterion is reached.

In 2007, Mandavi et al. (2007) suggested an optimized version of the original HS named the improved harmony search (IHS) algorithm. The authors suggested adjusting and evolving the parameters of the HS in the different iterations instead of using them with fixed values. Consequently, during the execution, the pitch adjustment rate (PAR) will increase linearly and the bandwidth (the Fret’s width (FW)) will decrease exponentially according to the following expressions:

$$ {\text{PAR}} = \frac{{{\text{PAR}}_{ \hbox{max} } - {\text{PAR}}_{ \hbox{min} } }}{\text{MaxItr}}*{\text{currentIteration}} + {\text{PAR}}_{ \hbox{min} } $$
(15)
$$ {\text{FW}} = {\text{FW}}_{ \hbox{max} } * {\text{exp}} \left( {{\text{coef}}* {\text{currentIteration}}} \right) $$
(16)
$$ {\text{coef}} = \frac{{{ \log }\left( {\frac{{{\text{FW}}_{ \hbox{min} } }}{{{\text{FW}}_{ \hbox{max} } }}} \right)}}{\text{NI}} $$
(17)

Although the HS was improved, it still has some weakness. In fact, most of the decision variables stored in the new harmony are taken from the elements of the harmony memory. In addition, it should be noted that in most of the time, the harmony memory is stable and does not present a wide variety of values to the improvisation. Thus, in general, the harmony search algorithm has a small probability to provide a good quality of new harmony vectors, and this may affect the convergence speed of the algorithm. To tackle this problem, Ammar et al. (2013) proposed to include the idea of the particle swarm optimization (PSO) velocity in the search process of IHS algorithm. Such hybridization will allow creating a wide variety of solutions in memory (with respect to their allowable ranges) and will help the convergence to the optimal solution. In fact, PSO mechanism can create iteratively a new population totally updated and closer to the optimum solution. Over the iterations, the particles in PSO converge together around an optimum through a combination of exploration and exploitation steps of the search space. The proposed hybrid algorithm is named PSO-based update memory for improved harmony search (PSOUM-IHS) (Ammar et al. 2013). Its flowchart is shown in Fig. 5. Here, the main idea focuses on exploiting the stochastic and dynamic aspects of particle velocities which aid to guide the search to the right areas of the work space. To realize this hybridization, the vectors of memory in IHS are considered as particles of the swarm, and the new values of memory for the new improvisation are considered as the novel positions reached by these particles. For each particle, the velocity \( v_{j} \) and the position \( x_{j} \) are calculated according to the following equations:

$$ v_{j} \left( {t + 1} \right) = \varPsi \left( t \right)v_{j} \left( t \right) + c_{1} \varphi_{1} \left( {p_{j} \left( t \right) - x_{j} \left( t \right)} \right) + c_{2} \varphi_{2} \left( {p_{g} \left( t \right) - x_{j} \left( t \right)} \right) $$
(18)
$$ x_{j} \left( {t + 1} \right) = x_{j} \left( t \right) + \left( {1 - \varPsi \left( t \right)} \right)v_{j} \left( {t + 1 } \right) $$
(19)

where:

Fig. 5
figure 5

Flowchart of PSOUM-IHS

\( c_{1} \) and \( c_{2} \): acceleration/weighting factors.

\( \varphi_{1} \) and \( \varphi_{2} \): random numbers in [0,1].

\( \varPsi \): inertia factor.

\( p_{j} \): local best position (found by the th particle).

\( p_{g} \): global best position (found by the swarm).

Readers can find more details concerning this hybrid algorithm in (Ammar et al. 2013).

In this study, the PSOUM-IHS is employed for the parameter adjustment phase of the obtained T2HFBFS. For that, the parameters of the best T2HFBFS (generated by the MOEIP structure learning phase) constitute an element of the memory to be evolved via the PSOUM-IHS. So, each element of the memory is represented by a NParm×Size matrix. Nparm defines the parameter number, while Size is the number of \( {\text{BFII}}_{kj}^{l} \) modules of the best T2HFBFS found. The parameters encoded in the matrix are the IT2 BMFs (\( c, \sigma_{L} , \sigma_{U} , p_{L} , p_{U} , q_{L} , q_{U} \)) and the linear weights of the consequent parts of rules. The used objective function in this phase is the RMSE (defined previously).

6 Hybrid algorithm to evolve the T2HFBFS

In general, the modeling of high-dimensional problems is usually preceded by a data preparation phase known as feature selection. Instead of working with all features of the original data set, feature selection algorithms allow the reduction in the feature space, which enhance the prediction performance of the system and speed up the search process. In this paper, a feature scoring technique called the statistical dependency (SD) algorithm (Pohjalainen et al. 2015) is exploited as a data preparation phase when the tested data set is a classification data set formed by a high input dimension. Next, the training process starts: The MOEIP and the PSOUM-IHS are alternately applied to find an optimum or a near-optimum T2HFBFS with good precision and a minimum number of rules. The following steps give a brief description of the hybrid evolving algorithm:

  1. (a)

    A data preparation step: The most useful input data are selected by the SD feature selection algorithm. This step is performed only when we are testing a high-dimensional classification problem.

  2. (b)

    Initialization step: An initial population of random trees is generated with random architectures. A population of T2HFBFSs is built by executing the subtractive clustering algorithm on the trees (Sect. 3).

    Global_Gn = 0; (the global generation number)

    Global_Itr = 0; (the global iteration number)

  3. (c)

    Multi-objective structure learning phase: The MOEIP algorithm is applied to evolve the T2HFBFS population toward a Pareto front of non-dominated solutions. Each solution of the Pareto front corresponds to a T2HFBFS tree. The used objective functions are defined by (13) and (14).

  4. (d)

    Go to step (e) in the case of attaining the maximum number of MOEIP_Itr.

    Global_Itr = Global_Itr + MOEIP_Itr;

Otherwise, go back to step (c);

  1. (e)

    The most suitable solution (T2HFBFS having a good trade-off between the accuracy and the interpretability) is selected to be the output of the multi-objective structure learning phase.

  2. (f)

    Parameter optimization phase: The PSOUM-IHS is applied to evolve the parameters of the best T2HFBFS. These parameters formulate an element of the memory. The used objective function here is defined by (13).

  3. g)

    Go to step (h) in the case of attaining the maximum number of Parm_Itr, or in the case of not finding a better matrix of parameters after a fixed time.

    Global_Itr = Global_Itr + Parm_Itr;

Otherwise return to step (f);

  1. (h)

    Stop in the case of generating a satisfactory T2HFBFS or reaching a maximum number of global iterations.

    Otherwise, Global_Gn = Global_Gn + 1 and go back to step (c).

7 Experimental results

The performance of the T2HFBFS is evaluated through six data sets of different nature. These data sets include four cases of time series forecasting problems and two cases of high-dimensional classification problems. For all experiments, the minimum and the maximum number of levels (depth of the trees) is, respectively, 2 and 4. Moreover, the minimum and the maximum number of nodes (degree of the used trees) is, respectively, 2 and 5. Table 1 gives the best-suited list of parameters used in the experiment.

Table 1 Parameter initialization

7.1 T2HFBFS for time series forecasting

For this kind of problems, four types of nonlinear benchmark prediction time series are tested, including the Mackey–Glass chaotic time series, the Box and Jenkins’ gas furnace time series, the monthly NAO climatic index time series and the sunspot number time series.

For a meaningful comparison with other state-of-the-art learning systems, all the data sets are divided into the same number of training and testing patterns. The simulations are running 10 times and then results are averaged.

7.1.1 Predicting the Mackey–Glass Chaotic Time Series

The Mackey–Glass time series (MG) (Mackey and Glass 1977) is one of the widely known benchmark problems which have been intensively studied in several works. The chaotic system is the signal \( x\left( t \right) \) produced by the numerical solution of the following time delay differential equation:

$$ \frac{{{\text{d}}\left( {x\left( t \right)} \right)}}{{{\text{d}}t}} = \frac{{ax\left( {t - \tau } \right)}}{{1 + x^{c} \left( {t - \tau } \right)}} - bx\left( t \right) $$
(20)

The chaotic behavior of this series depends on the parameter \( \tau \). If \( \tau \) > 16.8, MG has a chaotic behavior. In our case, the parameters are initialized as follows: \( \tau \) = 17, a = 0.2, b = 0.1 and c = 10. These initial conditions are the same ones used by the comparison systems. Four past values are used to predict \( x\left( {t + 6} \right) \), which are \( x\left( t \right) \), \( x\left( {t - 6} \right) \), \( x\left( {t - 12} \right) \) and \( x\left( {t - 18} \right) \). A total of 1000 observations were generated, where the first 500 samples are used for the training and the last 500 samples are used for the test.

After performing 13 global iterations, 2 global generations and 520 number of Function Evaluations FEs, a near-optimum T2HFBFS was obtained with 7 rules. The training and testing RMSE values of the generated T2HFBFS are equal to 9.2112e−16 and 9.1012e−16, respectively. Figure 6 shows the prediction result of the T2HFBFS for training and testing data.

Fig. 6
figure 6

Results of MG prediction for training and testing data

In Table 2, we make a comparison between the results of the proposed T2HFBFS and other state-of-the-art type-1/type-2 fuzzy systems and neural network approaches. As can be observed in this table, the proposed T2HFBFS shows better performance than the other works in terms of achieving high levels of accuracy using simpler fuzzy rule base.

Table 2 Comparison results of Mackey–Glass time series

7.1.2 Predicting the Box–Jenkins chaotic time series

The proposed T2HFBFS is applied to the Box and Jenkins gas furnace prediction problem. The data set was recorded from a combustion procedure of a methane–air mixture. This time series is a well-known real-world benchmark example frequently exploited for testing prediction methods. The used data set consists of 296 pairs of input–output measurements collected from a laboratory furnace (Box and Jenkins 1976). The first 200 data points were employed for training and the remaining 96 were employed for testing. \( u\left( t \right) \) is the gas flow into the furnace considered as input, and \( y\left( t \right) \) is the CO2 concentration in outlet gas considered as output. In order to make meaningful comparisons with previous works, \( y\left( {t - 1} \right) \) and \( u\left( {t - 4} \right) \) are selected to be used as inputs to predict \( y\left( t \right) \).

After performing 6 global iterations, 1 global generation and 240 number of FEs, the obtained RMSE values for the training data and the testing data are, respectively, 0.0044 and 0.0108. The optimum T2HFBFS is generated with 4 fuzzy rules. Figure 7 illustrates the prediction results including the actual and the predicted outputs for training and testing data. From these results, it is remarkable that the proposed system might attain good results after only few hundreds of FEs and using the minimum number of rules.

Fig. 7
figure 7

Results of Box and Jenkins prediction for training and testing data

The proposed approach is compared in Table 3 with several learning techniques such as the genetic and memetic fuzzy design methods (Wadhawan et al. 2013), the belief rule-based (BRB) system (Chen et al. 2013), the adaptive neuro-fuzzy inference system (ANFIS) (Samanta 2011), the beta basis function neural network (BBFNN) system (Dhahri et al. 2013), the single multiplicative neuron trained with cooperative sub-swarms of PSO (SMN-COPSO) (Samanta 2011), the flexible neural tree (FNT) system (Chen et al. 2005) and the fuzzy granular evolving modeling (FBeM) system (Leite et al. 2011). Our system is also compared with the interval type-2 intuitionistic fuzzy logic system IT2 IFLS (Eyoh et al. 2018). As can be observed from the table, the T2HFBFS achieves better performance in terms of attaining the best training and testing errors with few number of fuzzy rules.

Table 3 Comparison results of Box and Jenkins time series

7.1.3 Predicting the monthly NAO index time series

The North Atlantic Oscillation (NAO) index (Hurrell and van Loon 1997) can be defined as the difference of sea level pressure between two stations located near the centers of Azores High and Icelandic Low. Positive values of the NAO index indicate a wet and warm winter in northern Canada and Greenland, in the Western Europe and in the Eastern US. Negative values of the NAO indicate a cold winter in these regions and storms are heading south toward the Mediterranean Sea. This leads to increase rainfall and storm activity in North Africa and southern Europe.

To make a faithful comparison with other existing models, the used data set in this experiment is formed by 712 mean monthly NAO index values given by daily measures of the NAO index since January 1950 to April 2009. Six values are used to predict \( y\left( {t + 1} \right) \), and the input–output data format is defined by: [inputs; output] = \( \left[ { y\left( {t - 5} \right), y\left( {t - 4} \right), y\left( {t - 3} \right), y\left( {t - 2} \right), y\left( {t - 1} \right),y\left( t \right); y\left( {t + 1} \right)} \right] \). where t defines an identifier of the month/year of measure and \( y\left( t \right) \) presents the NAO monthly index. From the 712 data points, the first 500 data points are used for training and the remaining 212 are used for the test. The data set is available from: http://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/nao.shtml.

After 12 global iterations of learning, 2 global generations and 480 number of FEs, a near-optimum T2HFBFS was generated with 7 rules. The training and testing RMSEs of T2HFBFS are 1.6870e−07 and 1.3021e−07, respectively. Figure 8 shows the actual and the T2HFBFS predicted outputs. These results confirm again the capacity of the proposed system to achieve high levels of accuracy using a reduced number of rules. In fact, this is due to the multi-objective nature of the learning process which allows considering the two performance measures of accuracy and interpretability at the same time. Also, the proposed hierarchical fuzzy design contributes to the improvement in the system’s approximation capacity by generating more accurate and interpretable solutions.

Fig. 8
figure 8

Results of NAO index prediction for training and testing data

As given in Table 4, the performance of T2HFBFS is compared with that of the F-transforms (Di Martino et al. 2011), Wang–Mendel method (Wang and Mendel 1992) and the local linear wavelet neural network (LLWNN) system (Chen et al. 2006). RMSE testing values presented in this table indicate that the T2HFBFS gives considerably smaller error values than its competitors.

Table 4 Comparison results of NAO index time series

7.1.4 Predicting the sunspot number time series

It is a real-world non-stationary and highly complex time series. This data set presents the yearly average relative number of sunspots observed (Izeman et al. 1985). It is recorded for the years 1700–1979. To make a meaningful comparison with other existing studies, samples between 1700 and 1920 are exploited as training data. And for the test, two additional sets are used: the first set is from 1921 to 1955 and the second is from 1956 to 1979. The used inputs are \( y\left( {t - 4} \right), y\left( {t - 3} \right), y\left( {t - 2} \right) \) and \( y\left( {t - 1} \right) \), and the predicted output is \( y\left( t \right) \).

After attaining 12 global iterations, 2 global generations and 480 number of FEs, an optimum T2HFBFS was generated with a training RMSE value equal to 2.8301e-016 and with a number of fuzzy rules equal to 6. Regarding the testing part, the RMSE testing values for the first and the second testing sets are, respectively, 1.2741e−016 and 1.6433e−016. Figure 9 presents the actual and the T2HFBFS predicted outputs for training and testing data.

Fig. 9
figure 9

Results of sunspot number prediction for training and testing data

A comparison with existing studies is made in Table 5 such as the beta basis function neural network system trained by the artificial bee colony (BBFNN-ABC) (Dhahri et al. 2012), the fuzzy wavelet neural network (FWNN) systems (Yilmaz and Oysal 2010), the recurrent fuzzy neural networks (RFNN) (Aliev et al. 2009) and the flexible beta basis function neural tree trained by a hybrid evolutionary algorithm (FBBFNT_EIP & HBFOA) (Bouaziz et al. 2013). As shown in the table, the T2HFBFS proves again its superiority against the other fuzzy and neural techniques by reaching lower training and testing errors.

Table 5 Comparison results of sunspot number time series

7.2 T2HFBFS for high-dimensional biomedical classification problems

To more analyze the efficiency of the proposed system, the T2HFBFS is further applied for testing two examples of high-dimensional classification problems, which are the lung cancer and the prostate cancer data sets. These classification data sets were taken from the Kent Ridge Bio-medical Data Set Repository (Available at http://leo.ugr.es/elvira/DBCRepository/). For this kind of problems having high input dimension, the SD feature selection algorithm is applied as an initial filtering step to select the most useful inputs. In addition, to more estimate the efficiency of the model, fivefold cross-validation method is implemented on the data in order to give a more honest assessment of the true classification accuracy. In general, k-fold cross-validation involves dividing randomly the data set into k folds, or groups, of approximately equal size. The first group of data is considered as a validation set, and the remaining k − 1 groups are treated as training sets. This process is repeated k times; each time, a different group of data is used as a validation set. The results are then given by computing the mean of the model skill scores.

7.2.1 Lung cancer data set

It is a classification data set utilized in the classification of adenocarcinoma (ADCA) and malignant pleural mesothelioma (MPM) of the lung. This data set is composed by 181 data points: Among them, 150 tissues are ADCA and the remaining 31 are MPM. Each data point is described by 12,533 features (genes).

After applying the SD feature selection algorithm on the data set, the number of the used features is reduced from 12,533 to 12 features. The classification results of fivefold cross-validation are, respectively, 98.87% for training data and 97.82% for the testing part. An optimum T2HFBFS with 6 fuzzy rules is generated after executing 5 global iterations, 1 global generation and 200 number of FEs. The evolved T2HFBFS is shown in Fig. 10.

Fig. 10
figure 10

Evolved T2HFBFS in the lung cancer case

As given in Table 6, the performance of the T2HFBFS is compared with other existing approaches, including a bootstrapping gene selection method (Pang et al. 2007), a linear discriminant analysis-based genetic algorithm (LDA-GA) (Huerta et al. 2008), a multi-objective evolutionary algorithm based on interpretable fuzzy (MOEAIF) method (Wang and Palade 2011) and a fuzzy logic method combined with the ant colony optimization (ACO) algorithm and the principal components analysis (PCA) (Ayat and Rahi 2014). As can be seen from the table, the T2HFBFS outperformed all the other models and succeeds to achieve satisfactory classification performance with a small fuzzy rule base.

Table 6 Performance comparison in the case of lung cancer data set

7.2.2 Prostate cancer data set

The prostate cancer database is used to more evaluate the classification performance of the system. This data set is employed for normal versus tumor classification. It is composed by 12,600 genes (features) and 136 data points (77 samples contain tumors and 59 do not contain tumors). The features introduce normalized gene expression values taken from the microarray image.

By applying the SD feature selection algorithm on the data, the number of features is reduced from 12,600 to 20 features. After performing 16 global iterations, 2 global generations and 640 number of FEs, an optimum T2HFBFS is generated with a rule base formed by 7 rules. The average classification rates on training and testing data are, respectively, 97.42% and 96.22%. The best-evolved T2HFBFS is shown in Fig. 11.

Fig. 11
figure 11

Evolved T2HFBFS in the prostate cancer case

Table 7 makes some comparisons with other related works such as the granular support vector machines–recursive feature elimination (GSVM-RFE) algorithm (Tang et al. 2005), the transductive support vector machine (TSVM) method (Innocent and Kurian 2014) and the classification method based on association rule and information gain ratio on fuzzy rough set (AR-FRS-GR) theory (Innocent and Kurian 2014). As given in Table 7, classification results prove again the superiority of the T2HFBFS against the other competing systems.

Table 7 Performance comparison in the case of prostate cancer data set

8 Conclusion

In this paper, the hierarchical design of type-2 fuzzy system is considered using a tree representation. The proposed T2HFBFS is formed by a set of low-dimensional rule bases arranged in a multi-level structure. The different sub-fuzzy models of the hierarchy are based on interval type-2 beta fuzzy sets in the premise, while the consequent part uses the TSK-type fuzzy reasoning. Regarding the learning process, two hybrid tasks of structure learning and parameter update are iteratively performed. The structure learning task relies on the MOEIP algorithm as a multi-objective algorithm in order to learn a population of T2HFBFSs. This step allows the generation of an optimal structure of T2HFBFS with a good balance of interpretability accuracy. The approximation ability of the obtained system is further enhanced by performing a second parameter tuning step using the PSOUM-IHS algorithm. This step allows the optimization of the IT2 BMF parameters and the consequent elements of rules. Simulations on time series forecasting problems and classification data sets show that the T2HFBFS outperforms other competing type-1/type-2 fuzzy approaches and neural network methods. In most of the cases, the proposed system is able to achieve higher levels of accuracy with a small rule base as compared to its competitors.