1 Introduction

Evolvability of systems is one of the properties that makes fuzzy models or models, in general, being more in rapport with reality. The concept itself exhibits many facets and is quite often linked with an idea of adaptive systems- the architectures, which have been around for a long time. The systems (models) have to evolve because the real world is often nonstationary. The perception of the world changes depending upon a certain observer (user) and his preferences and in this way the model evolves to adhere to the needs of the users.

From the algorithmic perspective, being treated in a narrow sense, the evolvable system should possess an ability to realize some parametric adaptation, which means that it has to be endowed with a suite of parameters (so that the required flexibility is present in the system) along with mechanisms of adaptation so that the required adjustments can be realized in an effective fashion. Neural networks are typical representative examples of adaptive systems.

In general terms, the evolvable systems are characterized by abilities to adjust their structure as well as parameters to the varying characteristics of the environment (with the term of environment embracing processes/phenomena with which the system has to interact or deal with the users using the system). Human-centric systems, including recommending systems (recommenders), are usually evolvable systems as they have to cope with continuously changing preferences of the users and address the need to deal with adjustable requirements. The relevance mechanisms have to be carefully accommodated through building the evolvability capabilities of the system.

The notion of evolvable systems along with the development strategies have undergone a number of refinements and with this regard one can refer to a series of representative studies carried out so far [see (Angelov 2004a, b); (Angelov et al. 2008); (Huang et al. 2007); (Jain et al. 1999)]. The concept of evolvability is perceived in different ways and in some cases it looks at the underlying optimization processes (which exhibit a great deal of adaptive mechanisms). The category of genetic-based optimization of fuzzy models has assumed here a visible position [the reader may refer here to (Akbarzadeh-T et al. 2008); (Al-Razgan and Domeniconi 2006); (Aliev et al. 2009); (Antonelli et al. 2009); (Molina et al. 2006, 2007); (Park et al. 2007); (Pham and Castellani 2006)].

The objective of this study is to elaborate on the facets of conceptual and algorithmic diversity of evolvable systems. We discuss on the two essential and general directions present in the realm of evolvable systems, that is a concept of adjustable knowledge representation resources and an idea of evolvability of perception perspectives. Those are the two dominant dimensions of the concept of evolvability. While this taxonomy is of general nature, for illustrative purposes, our investigations are focused on fuzzy rule-based systems as this architecture of fuzzy models can help exemplify the main ideas in a clear and convincing manner. While the detailed algorithmic studies are presented as well, some aspects of general development character are only highlighted.

The material is structured into six sections. We start with a brief introduction to the functional components of rule-based models and discuss a main ways in which they are constructed. The perspective of adjustable knowledge representation is presented in Sect. 3, which is followed by a discussion on evolvability of perception perspectives where a certain descriptive fuzzy model is cast in the suitable setting by adjusting the nonlinear mapping of the input spaces (Sect. 4). Section 5 is devoted to the detailed algorithmic developments of evolvable systems where the underlying design is discussed in detail. Conclusions are covered in Sect. 6.

As already mentioned, to focus our investigations, we consider a broad category of fuzzy systems (models) structured as a collection of rules of the form

$$ {\text{if }}{\mathbf{x}}{\text{ is}}\ R_{\text{i }}\quad{\text{then}}\;{{y}} = {{f}}_{{i}} \left( {{\mathbf{x}},{\mathbf{a}}_{{i}} } \right) $$
(1)

In a more direct forms of fuzzy models (in which all input variables are articulated explicitly), we encounter the rules of the following form:

$${\text{if }} x_{1} \;{\text{is}}\;{{A}}_{{i}} \;{\text{and}}\;{{x}}_{2} \;{\text{is }} B_{{i}} {\text{ and}}\ x_{3} {\text{ is}}\ C_{{i}} \ldots \quad{\text{then}}\ y = {{f}}_{{i}} \left( {{\mathbf{x}},{\mathbf{a}}_{{i}} } \right) $$
(2)

where the inputs (x) come from R n and R i is a fuzzy set defined in the Cartesian product of the input variables. The vector of parameters of the local functions f i standing in the conclusion part of the rules is denoted by a i . Fuzzy sets formed for the individual variables are denoted by A i , B i , C i , … etc. The number of rules is equal to “c”.

2 Fuzzy models: a generic functionality and design

In spite of the genuine diversity of differences and ensuing design algorithms, there are a few common generic components of rule-based fuzzy models and the ways, in which they are put together to construct the fuzzy model.

2.1 Information granules

The information granules are usually realized through clustering (Bodenhofer and Bauer 2003) or fuzzy clustering, say Fuzzy C-Means (FCM) (Bezdek 1981); (Pedrycz 2005); (Pedrycz and Hirota 2007). FCM operates on numeric data and produces a family of “c” clusters (information granules) R 1, R 2,…, R c with membership functions conveyed by the partition matrix. The prototypes are the other component of the granular structure of data. The geometry of the clusters—fuzzy sets is implied by the form of the distance function used in the generation of the clusters (typically a Euclidean distance is considered) and the value of the fuzzification coefficient (m). The higher the number of information granules, the larger the number of rules and potentially the higher accuracy of the resulting model albeit its interpretability (transparency) becomes lower.

2.2 Local models

Those are descriptors of input–output relations whose relevance is limited to the region of the input space specified by the information granules standing in the condition part of the rule. There is a high diversity possible here. Local models could be linear models, polynomials, neural networks, differential equations; just listing a few examples. Local models set up a tradeoff between approximation capabilities and interpretability. For instance, local models coming in the form of neural networks exhibit high accuracy but low interpretability. Obviously, along with the nature of the model, comes a certain level of computational effort and optimization procedures. They could be either gradient-based learning or simple pseudo-inverse matrix computing yielding a global minimum of the performance index guiding the estimation of parameters of the local linear models.

2.3 Aggregation

The rules are aggregated giving rise to the overall output of the overall model. The common aggregation mechanism uses a weighted sum of the local models where the weights are the activation levels of the condition parts of the corresponding rules. The typical expression reads as follows

$$ {\hat{{y}}} = \sum\limits_{{{{i}} = 1}}^{{c}} {\phi ({{R}}_{1} } ,{{R}}_{2} , \ldots ,{{R}}_{{c}} ){\text{L}}_{{i}} ({\mathbf{x}}) $$
(3)

with ϕ being a function of the degrees of activation of the conditions (R 1, R 2,…, R c) of the consecutive rules present in the system. Here L i stands for the local model present in the conclusion part of the i-th rule.

3 Evolvable systems: a perspective of adjustable knowledge representation resources

The observation about the architecture of fuzzy rule-based models, especially when cast in the context of human-centric and interpretable systems (Pedrycz 2005), presented in the previous section, entails that from a more abstract perspective such evolvable models can be viewed in terms of its knowledge representation perspective or knowledge representation resources. More specifically, we can talk here about several important components such as e.g.,

  1. (a)

    the number of rules, which is implied by the number of information granules used to granulate input variables or collections of input variables,

  2. (b)

    the form of information granules, and

  3. (c)

    the dimensionality of the input space

These components are usually considered and adjusted when optimizing multidimensional systems [see (Molina et al. 2006)]. Let us assume that these knowledge representation resources (capabilities) can be quantified in the form of a single index Γ. The index itself can be treated as a function of the number of clusters, dimensionality of the input space, and other factors in addition to those listed above. Of course, an assumption of having this single index is quite strong and one might be cognizant that in some cases forming this index might not be feasible and may call for additional effort in which the domain knowledge has to be engaged.

In a simple version, the index Γ can be formed by taking a weighted sum of the individual knowledge representation resources, say

$$ {{\Upgamma}} = {{w}}_{1} \Upgamma_{1} \, + {\text{ w}}_{2} \Upgamma_{2} \, + \, \cdots \, + {{ w}}_{{p}} \Upgamma_{{p}} $$
(4)

with w 1, w 2, …, w p being the weights of the individual components of the index while Γ1, Γ2,…, Γ p representing the indexes quantifying the nature of the individual resources available to the realization of the system. For instance, the quantification of some of them could be quite straightforward by being linked directly to the corresponding resources, say Γ1 = c (the number of rules), Γ2 = p (the dimensionality of the input space). Both of them can vary as the granularity of the model (the number of rules) changes and the adjustable dimensionality of the input space (which could be modified by invoking the mechanisms of feature reduction) can be modified.

In the context, is worth recalling a wealth of studies reported in the existing literature [see, for instance, (Bodenhofer and Bauer 2000, 2003, 2005)]. With regard to the readability of rule bases, one can allude to the components such as the number of rules, total rule length, and an average rule length. The intuitively appealing idea could be quantified through the multiplicative form of the expression, Comp * Part * Cov, where Comp represent the complexity of the classifier (as the rule-based model is focused on classification problems), Part is the average normalized partition index (computed as the inverse of the number of information granules decreased by 1), and Cov is the normalized coverage level of the fuzzy partition. The study reported in Alonso et al. (2009) takes into account a more elaborate index in which a larger number of factors are involved. More specifically, there are in total six variables such as: total number of rules, total number of conditions, number of rules which use only a single input, number of rules which use two variables, number of rules using three or more variables, the total number of labels—information granules defined per input.

Some other indexes may require more attention when proceeding with the quantification of the underlying effect, say, the form of information granules in which case we may quantify the aspect of transparency of information granules (in terms of interaction occurring between them and an explicit quantification of individual variables versus the use of multidimensional fuzzy sets defined over all input variables). The values of the weights are estimated on a basis of input provided by the user/expert.

The well-known indexes of complexity of system modeling including minimum description length (MDL) principle, the Bayes information criterion (BIC) or the Akaike information criterion (AIC) might be worth considering albeit one should be cognizant that they apply to numeric models not distinguishing between different semantics of the individual parameters of the granular (rule-based) models.

The multifaceted nature of knowledge representation becomes more apparent considering that in addition to the interpretability of the produced source of knowledge, an issue of “stability” of knowledge is of interest as well. In a nutshell, a collection of information granules being an abstract view at the experimental evidence can be constructed in different ways resulting in similar albeit not identical information granules. The variability of the information granules contributes to the stability of the acquired knowledge. More specifically, let us denote by G = {G i } and F = {F j }, i = 1, 2,…, c 1, j = 1, 2,…, c 2 information granules built on the basis of the same source of experimental evidence X. These information granules are expressed through the corresponding rows of the partition matrices being the result of running some algorithm of fuzzy clustering, say FCM. Note that in general c 1 is not equal to c 2. The stability of the obtained information granules can be expressed in the form of the following index stability index S expressing the distance ||·||2 of the corresponding proximity matrices Prox(G i ) and Prox(F j ),

$$ \begin{gathered} {{S}} = \left\| {\Pr {\text{ox}}\left( {\user2{G}} \right)-\left. {\Pr {\text{ox}}\left( {\user2{F}} \right)} \right\|} \right.^{2} \end{gathered} $$
(5)

where the entries of the matrix Prox(G) are determined on a basis of the partition matrix U obtained as a result of granulation of X, that is

$$ {\text{Prox}}\left( {\user2{G}} \right) = {{U}}^{{\rm T}} \circ {{U}} $$
(6)

with its (k 1, k 2) entry being computed by summing the minima of the corresponding columns of the partition matrix, that is

$$ \sum\limits_{{{{i}} = 1}}^{{c}} {\min ({{u}}_{i{{k}_{ 1} }} ,{{u}}_{i{{k}_{ 2} }} )} $$
(7)

The facet of stability of information granules (forming the condition part of the rules) can be taken into consideration in the realization of the index Γ.

The evolvability of the model is guided by the values of Γ, which are used as a high-end objective function governing all modeling activities. For instance, the task can be formulated as follows:

  • design a fuzzy model such that the associated index of knowledge representation is Γ0 and the accuracy of the models becomes maximized.

The evolvability aspect comes into play because even though the value of Γ0 is given, the distribution of the resources can evolve (change)—clearly we might have a large number of modeling scenarios characterized by different allocations of knowledge representation resources from which those offering the highest level of accuracy are selected.

One may envision scenarios in which the evolvability of the model is governed by the increasing values of Γ:

  • over the course of modeling (in successive releases of the model), keep increasing the level of knowledge representation so that in consecutive releases of the model, they exhibit increasing values of Γ while increasing the accuracy of modeling.

This strategy is concerned with the creation of a series of models, which become more specific. The opposite strategy may promote evolvability towards the use of more compact systems by assigning less knowledge representation resources:

  • over the course of modeling (in successive releases of the model), keep decreasing the level of knowledge representation so that in consecutive releases of the model, they exhibit decreasing values of Γ while potentially retaining the accuracy of modeling.

Given the complexity of the underlying problem, the optimization of the fuzzy models characterized in this manner calls for evolutionary optimization as one of the feasible development alternatives. If we confine ourselves to Genetic Algorithms, the main task is to capture the structure in the suitable format of the chromosome.

4 Evolvability of perception perspectives

A fuzzy model regarded as a collection of rules interacting among themselves can be treated to a significant extent as a construct of prescriptive nature where knowledge about a system of phenomenon is encapsulated in a form of rules. The knowledge could be the one of commonsense nature, which is quite general, but not necessarily capturing the specificity and minute details of the system but instead focusing the very essence of the phenomenon. The same system may be perceived from different points of view depending upon the objectives of the observers and their objectives. The same model (which captured the prescriptive nature of the phenomenon) is still relevant but needs calibration. Fuzzy sets offer this possibility as in contrast to symbols (which semantics cannot be easily adjusted) they come with numeric membership functions, which could be made evolvable. This important aspect becomes a crux of the evolvability expressed in terms of the associated perception perspective. Figure 1 highlights the essence of this concept.

Fig. 1
figure 1

The concept of evolvability of perception perspectives impacting the development of the model

The fuzzy model positioned at the center of this figure is of prescriptive nature and usually specified in advance through capturing the underlying domain knowledge. For instance, it can be composed through a collection of fairly commonsense—driven rules whose relevance is self-explanatory but for which we might anticipate some need of further calibration or adjustment. With this regard, information granules and information granulation play an important role [cf. (Zadeh 1997)]

The realization of adjustments of perception can be conveniently realized by a nonlinear transformation of the input space in which fuzzy sets are formed. Consider a family of fuzzy sets A 1, A 2,…, A c defined in R, A i : R → [0,1]. The nonlinear transformation of the universe of discourse described as F: x → z results in the collection of transformed information granules, that is A i(F(x)), i = 1, 2,…,c. They produce an individual perception to which the general model adheres. The flexibility of the nonlinear transformation is provided by expressing it by a collection of piecewise linear functions, see Fig. 2, or a sum of translated sigmoid functions. In both cases we note a nonlinear transformation of the original input space. For instance, as shown in Fig. 2, the original segment [a 1, a 2] is “stretched” while some other, say [a i , a i+1] is very much “suppressed” to a very limited region in the transformed space. The realization of focus (attenuation) of attention (where the region is stretched) or reduction of attention is accomplished through the linear segments of the transformation function of different slopes. A similar effect is realized by running a series of sigmoid functions having different slopes.

Fig. 2
figure 2

Example of nonlinear (piecewise linear) transformation F of the input space x

The realization of the transformation is completed as an optimization problem of the best representation of the prescriptive model. Denote this prescriptive model by FM(A 1, A 2,…, A c ). The incorporation of the perception through the nonlinear transformation results in the fuzzy model expressed in the form FM(x; A 1, A 2,…, A c , f 1, f 2,.., f n ) where the individual transformation functions play a pivotal role. The experimental data associated with the specific perception of the system is given as input–output pairs (x(k), target(k)), k = 1, 2,…, N. The performance index Q to be minimized is defined as a sum of errors

$$ {{Q}} = \sum\limits_{{{{k}} = 1}}^{N} {[{\text{FM}}({\mathbf{x}}_{{k}} ;{A}_{ 1} , {A}_{ 2} ,\ldots ,{A}_{{c}} } ,{{f}}_{ 1} , {{f}}_{ 2} , \ldots ,{{f}}_{{c}} ) - {\text{target}}_{{k}} ]^{2} $$
(8)

Its minimization is realized with respect to the transformation functions f 1, f 2,.., f n (and their parameters such as e.g., endpoints of the segments of piecewise approximation or translations and slopes of the sigmoid functions)

$$ {\text{Min}}_{{{f}}_{{i}} \in {\user2{F}}} {{Q}} $$
(9)

where F is a family of all admissible transformation functions under consideration. So far we have discussed only two classes of such functions, which are intuitively appealing. In general, we can think of the families such that they satisfy the condition of continuity and monotonicity. The nonlinear functions could be either monotonically increasing (those were the functions already discussed) or monotonically decreasing. In this case, we may think of reversal (antonyms) of semantics of the original linguistic terms used in the prescriptive model. The overall schematic view of the realization of the perception-based evolvability concept via nonlinear transformation of the input spaces is illustrated in Fig. 3.

Fig. 3
figure 3

Nonlinear transformations of input spaces feeding original inputs to the prescriptive fuzzy model

Given the nature of the optimization task, instead of considering gradient-based algorithms (which could be of limited relevance here), one can envision the use of techniques of evolutionary optimization or population-based methods.

5 Development of evolvable systems

In this section, we get into a more detailed scheme of the design of evolvable systems. Before proceeding with this, let us show two representative practical scenarios.

  1. (a)

    In processing temporal slices of multivariable data such as e.g., in sensor networks, stock market, currency exchange rates, etc. readings are analyzed over some time slices and for each time slice a fuzzy model is constructed.

    Apparently the phenomenon varies over time and its complexity and variability could fluctuate by moving up and down in consecutive time segments (say, months, quarters, etc.). To accommodate this component of inherent variability, the models constructed in successive steps can change their granularity as schematically displayed in Fig. 4.

    Fig. 4
    figure 4

    Fuzzy models built for successive time slices of multivariable temporal data

  2. (b)

    In forming models for successive slices of volume, 3D type of data (such as e.g., those present in 3D biomedical data, atmospheric data and alike), the models constructed for successive slices have to exhibit some continuity but at the same time have to accommodate the variability of the phenomenon observed at successive volumes (see Fig. 5).

    Fig. 5
    figure 5

    Development of models for successive slices of volume data; shown is an evolution of underlying information granules (position in the input space, splitting and merging)

In both cases, there is an obvious necessity to build an evolvable architecture of the fuzzy model. The integral component of the structure that evolves is concerned with information granules and their variability over neighboring models (with these models either being in the relationship of spatial or temporal neighborhood). Once they have been established, building local models becomes predominantly a matter of parametric optimization.

In presence of dynamically collected data, information granules move around, become more refined or generalized—all of these effects are driven by the nature of data being collected and their continuous changes. Similarly, it is intuitive to anticipate that there is some structural continuity that is the clusters (information granules) evolve in a “smooth”, continuous manner when moving from one model to the next one. In other words, in the next time slice, the clustering is run again starting from the initial configuration, which has been “inherited” from the previous slice. As stressed, the number of clusters is not fixed in advance but could fluctuate from one data snapshot to another depending upon the structure of data themselves and the complexity of the underlying dynamics of the data (that is the underlying phenomenon generating the data). With reference to the evolvability of information granules and updates of the level of granularity, it is worth recalling some related studies pertaining to clustering data streams, see e.g., (Beringer and Hullermeier 2006); (Crespo and Weber 2005); (Guha et al. 2003); (Huang et al. 2007); (Park and Lee 2007); (Strehl et al. 2002); (Bandyopadhyay et al. 2006); (Tasoulis and Vrahatis 2005).

Referring to Fig. 6, let us elaborate in more detail on the process of clustering. We start with the first data snapshot, run the FCM on the data present there by producing a collection of the prototypes v 1[1], v 2[1], …, v c[1] (the number in the square bracket denotes that those prototypes are built on the basis of the data coming the first data segment X[1]). The number of clusters is determined based on the acceptable level of the predetermined optimization criterion. We elaborate on the computational details later on. When running the FCM on the second data snapshot X[2], we proceed with the prototypes that were formed for X[1] and use them as the starting point when running the FCM for X[2]. Inevitably, when iterating FCM, the prototypes change their location as a result of revealing the topology of the actually processed data. Once done, it could well be that a certain clustering quality indicator (performance index) produced there could exceed some required threshold or could assume values that are far lower than this threshold. In both cases, the number of clusters needs to be dynamically adjusted. In the first case, it must be increased—we could do that by splitting the most heterogeneous (diversified) cluster. In the second case, the number of clusters could be too high (capturing too many unnecessary details) and some merging process could be of interest.

Fig. 6
figure 6

The concept of dynamic clustering- progression of the FCM constructs (prototypes and partition matrices) in successive data snapshots and dynamic adjustments of the number of the clusters (prototypes) through the mechanisms of cluster merging and cluster splitting

In what follows, we are concerned with (a) splitting and merging criteria, (b) assessment of quality of clusters which could be directly used when controlling the dynamics of the clusters and deciding upon the change of the number of clusters themselves, and (c) optimization capabilities of fuzzy clustering (FCM) which could be exploited upfront when running the algorithm.

5.1 Reconstruction criterion in the assessment of quality of clusters

The number of clusters is another important design aspects of any clustering algorithm. While there has been a number of so-called cluster validity indexes (Bezdek 1981); (Jain et al. 1999), which are geared to assess the most “suitable” number of clusters that is supposed to correspond with the “true” structure present in the data. The inherent problem with them is that quite often one receives with several conflicting suggestions depending which cluster validity index has been selected. In this study, we follow a different path which could be referred to as a granulation-degranulation strategy [cf. (Pedrycz and Hirota 2007); (Pedrycz and Gomide 2007)]. In essence, once the information granules have been formed by the FCM, we use the prototypes to represent each pattern in the form of information granules by computing the corresponding membership degrees. At the next step, we complete a de-granulation, which is estimate the pattern on a basis of the prototypes and the membership degrees. Ideally, we would expect that the result of such de-granulation should return the original pattern. While this is quite unlikely, we may consider the distance between the original pattern and its de-granulated version to constitute a viable measure of quality of obtained reconstruction. Let us move to the technical details which provide some required formal description of the strategy.

Denote by X i the data whose membership to the i-th cluster is the highest, that is

$$ {\mathbf{X}}_{{i}} = \{ {\mathbf{x}}_{{k}} \in {\mathbf{R}}\left| {{i = {\text{arg}}}_{{j}} } \right.\;\max \;{\text{u}}_{{j}} ({\mathbf{x}})\} $$
(10)

The reconstruction error for the i-th cluster is computed as the following sum of distances between x k and its de-granulated version \( {\hat{\mathbf{x}}}_{{k}} \)

$$ {V}_{{i}} = \sum\limits_{{{\mathbf{x}}_{{k}} \in {\mathbf{X}}_{{i}} }} {\left\| {{\mathbf{X}}_{{k}} - \left. {{\hat{\mathbf{x}}}_{{k}} } \right\|} \right.}^{ 2} $$
(11)

where

$$ {\hat{\mathbf{x}}}_{{k}} = {\frac{{\sum\nolimits_{{{i = }1}}^{{c}} {{u}_{ik}^{{m}} {\mathbf{v}}_{{i}} } }}{{\sum\nolimits_{{{{i}} = 1}}^{c} {{u}_{ik}^{{m}} } }}} $$
(12)

The maximal value of V i , V max = max i V i , can serve as a suitable measure describing the reconstructibility aspects provided by the information granules. By varying the number of the clusters (c) we can affect the values of V i and in this way focus on selecting the most suitable structure in the data set.

5.2 Dynamic formation of fuzzy clusters—splitting and merging mechanisms

In what follows, we discuss the mechanisms of splitting and merging clusters. They become a crux of dynamic information granulation moving into details, we describe that the objective function minimized by the FCM for the 1st data snapshot exploits the standard sum of distances

$$ {{Q}}[1] = \sum\limits_{{{{i}} = 1}}^{{{{c}}[1]}} {} \sum\limits_{{{\mathbf{x}}_{{k}} \in {\mathbf{X}}[1]}}^{{}} {} {u}_{ik}^{{m}} ||{\mathbf{x}}_{{k}} - {\mathbf{v}}_{{i}} [1]||^{2} $$
(13)

(here Q[1] concerns the first data snapshot). Let us also note that the number of clusters, say c[1] may be selected on a basis of the assumed threshold level, V max. For the first data snapshot, we choose such c[1] for which the reconstruction error does not exceed some predefined threshold level, that is V max < e; with the value of e, e > 0, given in advance. Here we may adjust the number of the clusters so that the required inequality is satisfied. The choice of the threshold level can be realized by observing how much the values of the reconstructability criterion are affected by the number of clusters; it might well be that V exhibits a visible knee-like relationship when varying the number of clusters. We select the minimal value at “c” at which observed is a radical reduction in the values of the performance index under consideration. Subsequently this particular value of the threshold “e” becomes associated with the corresponding value of V. With respect to the number of clusters, one may consider the use of any of so-called cluster validity indices (partition coefficient, partition entropy, etc.), which can navigate our search for the number of clusters, viz. the granularity of the findings. A certain drawback associated with these validity indexes is that each of them could eventually point at different number of clusters (which might be quite confusing) or some indexes could exhibit a fairly monotonic character which prevents us from using them as a reliable indicator of the preferred level of granularity of the structure. Some cross-validation techniques could be more suitable. Alternatively, one can envision running several clustering methods and then forming consensus as being encountered in consensus clustering [cf. (Pedrycz and Hirota 2008); (Strehl et al. 2002)].

5.2.1 Cluster splitting

If the reconstruction criterion of the clustering completed for some data snapshot does not meet the reconstruction criterion by exceeding the assumed threshold e, then some clusters need to be split. Intuitively, this split could involve the cluster characterized by the highest level of variability, that is the one for which the index V i attains a maximum among all the clusters. Let this cluster be denoted by i 0. The split is realized by conditional clustering (Pedrycz 1998) of the data falling under the realm of the elements belonging to the i 0-th cluster that is \( {\mathbf{X}}_{{{{i}}_{ 0} }} \). Denote the set of indexes of these patterns by I 0. The formulation of the clustering problem is given in the form

$$ \min \sum\limits_{{{{i}} = 1}}^{2} {} \sum\limits_{{{{k}} \in {\mathbf{I}}_{ 0} }}^{{}} {{{f}}_{ik}^{{m}} } ||{\mathbf{x}}_{{k}} - {\mathbf{z}}_{{i}} ||^{2} $$
(14)

subject to the constraint

$$ \sum\limits_{{{{j}} = 1}}^{2} {{{f}}_{jk} = {u}_{{{{i}}_{ 0} {{k}}}} } $$
(15)

In essence, by minimizing (11) we invoke a computing process in which we split the i 0-th cluster into two clusters maintaining a balance of the membership grades of the original cluster. Here z i are the prototypes and F = [f ik ] denotes a partition matrix that satisfies constraint (15) meaning that the split is driven by the data and membership grades of the most diversified cluster. The term conditional clustering comes from the fact that we require that the resulting membership degrees f ik are distributed according to the membership values available for the cluster to be split. The detailed calculations of the partition matrix F and the two prototypes z 1 and z 2 are carried out iteratively according to the two expressions

$$ f_{{ik}} ={\frac{{u_{{i_{0} k}} }}{{\sum\nolimits_{{j = 1}}^{2} {\left( {{\frac{{||{\mathbf{x}}_{k} - {\mathbf{z}}_{i} ||}}{{||{\mathbf{x}}_{k} - {\mathbf{z}}_{j} ||}}}} \right)^{{1/(m - 1)}} } }}} $$
(16)

and

$$ {\mathbf{z}}_{{i}} = {\frac{{\sum\nolimits_{{{{k}} \in {{i}}_{0} }}^{{}} {{{f}}_{ik}^{{m}} {\mathbf{x}}_{{k}} } }}{{\sum\nolimits_{{{{k}} \in {{i}}_{ 0} }}^{{}} {{{f}}_{ik}^{{m}} } }}} \quad{{i}} = 1,{ 2}; \, {\mathbf{x}}_{{k}} \in {\mathbf{X}}_{{{{i}}_{ 0} }}.$$
(17)

5.2.2 Cluster merging

If the objective function for the number of the clusters is lower than the predefined level e then the number of the clusters could be reduced meaning that some original clusters could be combined together (merged). The way of doing that would be to merge two prototypes that are the closest to each other. Consider that these two are indexed by “s” and “l”, that is v s and v l. The new prototype v ~ results from the minimization of the following expression with respect to it, the new prototype v ~ results from the minimization of the following performance index

$$ {{Q}} = \sum\limits_{{{{k}} = 1}}^{N} {\left( {{u}_{sk} + {u}_{lk} } \right)^{{m}} \left\| {{\mathbf{x}}_{{k}} - {\mathbf{v}}^{ \sim } } \right\|^{2} \to {\text{Min}}\;{{Q}}\;{\text{with}}\;{\text{respect}}\;{\text{to}}\;{\mathbf{v}}^{\sim } } $$
(18)

If for these reduced number of the clusters the resulting reconstruction criterion is still lower than the critical value, we can move on with another reduction of the number of the prototypes. The process could be repeated by reducing the number of the clusters by one.

Let us rewrite the performance index (18) in a coordinate-wise fashion

$$ {{Q}} = \sum\limits_{{{{k}} = 1}}^{N} {\left( {{u}_{sk} + {u}_{lk} } \right)^{{m}} \left\| {{\mathbf{x}}_{{k}} - {\mathbf{v}}^{ \sim } } \right\|^{2} = \sum\limits_{{{{k}} = 1}}^{N} {} \sum\limits_{{{{j}} = 1}}^{n} {\left( {{u}_{sk} + {u}_{lk} } \right)^{{m}} \left( {{x}_{kj} - {v}_{{j}}^{ \sim } } \right)^{2} /\sigma_{{j}}^{2} \, } } $$
(19)

We make the derivative of Q equal to zero, \( \nabla_{{{\text{v}}^{\sim } }} {{Q}} = 0 \) and this leads to the new prototype

$$ {\mathbf{v}}^{\sim } = {\frac{{\sum\nolimits_{{{{k}} = 1}}^{N} { ( {u}_{sk} + {u}_{lk} )^{{m}} {\mathbf{x}}_{{k}} } }}{{\sum\nolimits_{{{{k}} = 1}}^{N} { ( {u}_{sk} + {u}_{lk} )^{{m}} } }}} $$
(20)

5.3 Characterization of structural dynamics of the families of clusters

The structure of clusters changes from one data slice to another. It is of interest to characterize these changes at some general level so we could learn about the dynamics of the structural changes of the evolvable model. There are several possible alternatives we can consider in describing the evolvable structure.

5.3.1 Granularity of structure

The number of clusters c[1], c[2],…. reported in the consecutive data slices can be viewed as the simplest and the most concise structural characteristics one could think of. The increasing sequence of these numbers is indicative of the growing structural complexity (which manifests in the increasing number of the clusters one has to construct to deal with it). This does not tell us however at all about the relationships between the prototypes produced in two neighboring data slices.

5.3.2 Relationships between information granules

To capture more detailed structural relationships, we take a different detailed path whose crux is the following. Given the prototype v i [ii + 1] we express it in terms of the prototypes v j [ii] available at the ii-th data snapshot. More specifically, we obtain degrees of membership of the i-th prototype expressed by v j [ii] which are determined in a “standard” manner

$$ w_{{ik}} = {\text{ }}{\frac{1}{{\sum\nolimits_{{j = 1}}^{{c[{\text{ii}}]}} {\left( {{\frac{{||{\mathbf{v}}_{i} [{\text{ii}} + 1] - {\mathbf{v}}_{k} [{\text{ii}}]||}}{{|{\mathbf{v}}_{i} [{\text{ii}} + 1] - {\mathbf{v}}_{j} [{\text{ii}}]|}}}} \right)^{{2/(m - 1)}} } }}} $$
(21)

where i = 1, 2,…, c[ii + 1] and k = 1, 2,…, c[ii]. This representation is schematically captured in Fig. 7.

Fig. 7
figure 7

Expressing relationships between information granules through describing information granules in terms of the granules obtained for the previous data slice

If all these values are either close to 0.0 or close to 1.0, this becomes indicative of a sound representation of the prototypes in terms of the structure we have revealed so far. On the other hand, if the values of w i tend to be close to 1/c[ii], we may talk about a high level of uncertainty when expressing the prototypes in the language of the already available structure. More formally, we can quantify the structural diversity by introducing a functional φ defined on the set of membership degrees such that φ (0) = φ (1) = 0 which is increasing over [0, 1/c[ii]) attains the maximal value at 1/c[ii] and then monotonically decreases to zero. The sum of the following form

$$ {T} = \sum\limits_{{{{i}} = 1}}^{{{\text{c[ii}} + 1 ]}} {\varphi ( {{w}}_{{i}} } ) $$
(22)

serves as a concise indicator of the uncertainty of structural differences observed from one data slice to another.

The collection of connections [w ij ] formed by the elements computed by (21) can be used to construct a relational model of structural relationships. Consider the i-th element (prototype) observed in the (ii + 1)-th data slice. It is described by the prototypes present in the ii-th data slice whose activation levels are w i1, w i2, …, w ic[i]. The same occurs for any other prototype available in the (ii + 1)-th data slice. This gives rise to the following collection of input–output pairs of data

$$ 1{\text{st prototype}}\quad {\mathbf{y}}(1) = \left[ {{{w}}_{11} {{w}}_{12} \ldots w_{{1{\text{c[ii}}]}} } \right]\quad{\mathbf{z}}(1) = \left[ {\begin{array}{*{20}c} 1 \hfill & 0 \hfill & 0 \hfill &0 \hfill & \ldots \hfill & 0 \hfill \\ \end{array} } \right] $$
$$ 2{\text{nd prototype}}\quad {\mathbf{y}}(2) = \left[ {{{w}}_{ 2 1} {{w}}_{ 2 2} \ldots {{w}}_{{2{{c}}[{\text{ii}}]}} } \right] \quad {\mathbf{z}}\left( 1 \right) = \left[ {\begin{array}{*{20}c} 0 \hfill & 1 \hfill & 0 \hfill & 0 \hfill & \ldots \hfill & 0 \hfill \\ \end{array} } \right] $$
$$ {{c}}[{\text{ii}} + 1\left] {{\text{prototype}}\quad {\mathbf{y}}({{c}}} \right[{\text{ii}} + 1]) = \left[ {{{w}}_{{{{c}}\left[ {{\text{ii}} + 1} \right]1}} {{ w}}_{{{\text{ c}}\left[ {{\text{ii}} + 1} \right]2}} \ldots {{ w}}_{{{\text{ c}}\left[ {{\text{ii}} + 1} \right]{{c}}[{\text{ii}}]}} } \right]\quad{\mathbf{z}}({{c}}[{\text{ii}} + 1]) = \left[ {\begin{array}{*{20}c} 0 \hfill & 0 \hfill & 0 \hfill & 0 \hfill & \ldots \hfill & 1 \hfill \\ \end{array} } \right] $$

Given this structure of data we arrive at the estimation problem in which the output is expressed by means of the activation levels w ij. In general we can formalize the relationship as a certain set of fuzzy relational equations

$$ {\mathbf{z}} ( {{k}}) = {\mathbf{y}}({{k}}) \cdot {{R}} $$
(23)

k = 1, 2,…, c[ii + 1] where · stands for a certain relational operator, such as e.g., max–min, min–max or their generalized versions of max-t and min-s with “t” and “s” being the corresponding t-norm or t-conorm. The fuzzy relation R is to be determined and one could consider using one of the estimation methods (which might lead to exact or approximate solutions to the problem). One can also describe the i-th prototype in terms of the prototypes obtained for the ii-th data slice by considering the following aggregation of weighted prototypes available at the previous data slice

$$ {\hat{\mathbf{v}}}_{{i}} [{\text{ii}} + 1] = \sum\limits_{{{{j}} = 1}}^{{{{c}}[{\text{ii}}]}} {{{w}}_{ij}^{{m}} {\mathbf{v}}_{{j}} [{\text{ii}}]}. $$
(24)

6 Conclusions

Evolvable systems enrich the landscape of fuzzy systems and make them well suited for human-centric systems in which the mechanisms of adaptability play a crucial role. We have stressed a need for a broad definition of such systems in which a general nature of evolvability has to be embraced. While the parametric adjustments are a part of the overall design agenda, the structural aspects of modifications are of far higher relevance. We introduced two general categories of evolvability, which emphasize the need for managing and securing a suitable allocation of knowledge representation resources (including the variable granularity of rules, order of polynomial local models and alike). Interestingly, the allocation process is predominantly concerned about the structure of the model and one has to proceed prudently with the optimization of the structure so that the accuracy of the model is also retained. Here the design framework supporting global optimization (including evolutionary and population-based optimization) becomes indispensible. The variability of perception perspectives (associated with human-centricity of systems embracing a realization of effective relevance feedback mechanisms) gives rise to another category of evolvable systems. We showed that one of the alternative approaches exercised here can focus on the adjustment of contexts via a nonlinear transformation of the individual input variables in which way we can emphasize the details of some ranges of the variables and generalize some others.

While in this study, we elaborated on the formulation of the pertinent problems and discussed some selected scenario at the algorithmic level, the detailed optimization cases have not been tackled and are left for further comprehensive studies.