Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Granular computing is concerned with how information is grouped together and how these groups can be used to make decisions [1, 2]. It is inspired by how human cognition manages information. Granular computing is used to improve the final representation of information models by forming information granules which better adapt to the known information. Although granular computing expresses information models, more commonly known as information granules, it can use a variety of representations to express such granules, which could be rough sets [3], quotient space [4], shadowed sets [5], fuzzy sets [6, 3335], etc.

Information granules are representations of similar information which can be used for a purpose, typically to model a portion of information. Forming fuzzy information granules is not new, knowing that many representations can be used there have been many approaches which try to solve this: via their relationships [7], optimization of time granularity [8], information granulation [9], with RBF neural networks [10], Interval Type-2 Fuzzy granules [11], non-homogeneous General Type-2 Fuzzy granules [12], etc.

This chapter proposes an approach to information granule formation by capturing, through samples, evaluations of uncertainty where their difference is a direct measure of uncertainty which is used to form Interval Type-2 Fuzzy information granules [2532].

This chapter is organized as follows: Sect. 2 describes the proposed approach as well as its motivation. Section 3 shows benchmark results alongside the discussion. Finally, Sect. 4 concludes the document.

2 Uncertainty-Based Information Granule Formation Methodology Description

To first understand the main methodology, a review of the motivation is necessary, as it describes the basis for the proposed approach. First, the basis for the proposed approach, which is the theory of uncertainty-based information [13, 14] is described; then, evaluations of uncertainty [15, 16] are described, which defines functions that represent uncertainty measures.

2.1 Uncertainty-Based Information

The concept of uncertainty is closely related to the concept of information. The fundamental characteristic of this relation is that involved uncertainty from any problem-solving situation is a result of information deficiency pertaining to the system within which the situation is conceptualized. This information could be incomplete, imprecise, fragmentary, unreliable, vague, or contradictory.

With the assumption that a certain amount of uncertainty can be measured from a problem-solving situation it is possible that a mathematical theory can be formed.

With another assumption that this amount of uncertainty is reduced by obtaining relevant information as a result of some action (e.g. obtaining experimental results, observing new data, etc.), the amount of obtained information by the action can be measured by that amount of reduced uncertainty. That is, the amount of information related to a given problem-solving situation that is obtained through some action is measured by the difference between a priori uncertainty and a posteriori uncertainty.

In Fig. 1, the shown diagram represents the general idea of the behavior of uncertainty-based information; where a reduction of uncertainty can be obtain by the difference of two uncertain models of the same information. That is, the a priori uncertainty model is obtained with a first sample of information, where as the posteriori uncertainty model is obtained with a second sample of information related to the same problem-solving situation.

Fig. 1
figure 1

Diagram of the behavior of the uncertainty-based information where uncertainty is reduced by the difference between two uncertain models of the same information

2.2 Evaluations of Uncertainty

To capture uncertainty, there are two fundamental types of evaluations: Type A and Type B.

Through repeated measurements, an average measured value can infer a standard deviation which forms a Gaussian distribution function, where this functions is a Type A evaluation of uncertainty.

Type B evaluations of uncertainty are represented by a rectangular probability distribution, in other words, a specified interval where the measurements are known to lie in.

2.3 Uncertainty-Based Information Granule Formation

Taking inspiration on uncertainty-based information, this can be interpreted in a manner which forms higher-type information granules where uncertainty can be captured and measured and build Interval Type-2 Fuzzy information granules.

A sample of information can build a model with uncertainty from the complete source of information; this is, since it is impossible to know the complete truth of any given situation, uncertainty will always exist in any sample information which may be taken from it.

Through a first sample of information (D1), an uncertain model (evaluation of uncertainty) can be created. Through a second sample of information (D2), another similar uncertain model can be also created. These two models of uncertainty are analogous to the models in the theory of uncertainty-based information, a priori and posteriori uncertainty models.

In a direct comparison with the theory of uncertainty-based information, the proposed approach does not reduce the uncertainty in the model, instead it measures and defines it to be able to use it in an information granule and have an improved representation of the information. The proposed approach is shown in Fig. 2, where a first sample of information obtains an evaluation of uncertainty, in the form of a Gaussian function, or Type-1 Gaussian membership function; and a second sample of information obtained another similar evaluation of uncertainty, of the same form. A difference is found between these two Gaussian membership functions defining the Footprint of Uncertainty (FOU), thus obtaining an IT2 Fuzzy information granule. Here there are three possibilities: (1) the first Gaussian membership function has an σ which is larger than the second; (2) the second Gaussian membership function has an σ which is larger than the first; and (3) the σ from both Gaussian membership functions are the same. For 1 and 2, the FOU which is created defines some uncertainty which has been measured and can now be used by the IT2 Fuzzy System; and for 3, since no uncertainty was measured a T1 Fuzzy Set is created.

Fig. 2
figure 2

Explanatory diagram of how the proposed approach measures and defines the uncertainty, and forms an IT2 Fuzzy set with such uncertainty

To show the viability of the proposed approach in that it captures uncertainty and forms IT2 Fuzzy information granules, an algorithm was created that would allow for results to be obtained. The following steps define the algorithms:

  1. 1.

    Obtain rules and centers. These can be obtained through any clustering algorithm, for the experimental case in this chapter the subcluster algorithm [17] was used.

  2. 2.

    Through a first sample of information (D1), all σ1 for all centers are calculated. These were found by calculating the Euclidean n-space distance between each data point and all centers, where the shortest distance defines to which center does that point belong to, afterwards having a set of data points for each cluster, a standard deviation was calculated as to form an evaluation of uncertainty in the form of a Gaussian membership function. For the case of testing, a random sample comprised of 40 % of the dataset was used.

  3. 3.

    Through a second sample of information (D2), in the same manner as the previous step, all σ2 for all centers are calculated. A random sample comprised of another 40 % of the dataset was used for this step.

  4. 4.

    Form the IT2 Fuzzy Gaussian information granules as proposed. This only builds the antecedents of a complete IT2 Fuzzy System.

  5. 5.

    The consequents are finally optimized via an evolutionary algorithm, obtaining a complete IT2 Fuzzy System which can be used to acquire results. For this chapter, Interval Takagi-Sugeno-Kang (TSK) [18, 19] consequents were used, they were optimized via a Cuckoo Search algorithm [20].

The next section uses this algorithm to obtain results.

3 Experimental Results and Discussion

For experimental tests, four datasets were used: iris, wince, glass, available from the UCI dataset repository [21], and a 5th order polynomial curve. Where the iris dataset, has 4 input features (petal length, petal width, sepal length, and sepal width), and 3 outputs (iris setosa, iris virginica, and iris versicolor). With 50 samples of each flower type, with a total of 150 elements in the dataset. The wine dataset, with 13 input features of different constituents (Alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, flavanoids, nonflavanoid phenols, proanthocyanins, color intensity, hue, OD280/OD315 of diluted wines, and proline) identifying 3 distinct Italian locations where the wine came from. With 59, 71, and 48 elements respectively in each class, for a total of 178 elements in the whole dataset. The glass identification dataset, has 9 input variables (refractive index, sodium, magnesium, aluminum, silicon, potassium, calcium, barium, and iron), and 7 classes (building windows float processed, building windows non float processed, vehicle windows float processed, containers, tableware, and headlamps). With 70, 76, 17, 13, 9, and 29 elements respectively in each class, for a total of 214 elements in the whole dataset.

3.1 Experimental Results

On Table 1, the obtained results are shown, where the after 30 execution runs for each dataset were made to obtain a minimum, maximum, mean, and standard deviation for each dataset.

Table 1 Obtained results for the chosen datasets

The following Figs. 3, 4, 5, 6 show one sample of the formed IT2 Fuzzy information granules of each dataset: iris, wine, glass, and 5th order polynomial, respectively.

Fig. 3
figure 3

Sample of the formed IT2 Fuzzy information granules for the Iris dataset

Fig. 4
figure 4

Sample of the formed IT2 Fuzzy information granules for the Wine dataset

Fig. 5
figure 5

Sample of the formed IT2 Fuzzy information granules for the Glass dataset

Fig. 6
figure 6

Sample of the formed IT2 Fuzzy information granules for the 5th order polynomial dataset

3.2 Results Discussion

The values obtained for the classification accuracy and RMSE error are not the best values obtained in general, yet they are comparable to current algorithms in terms of mean results [2224]. This is by no manner the best obtainable results this approach can acquire; this is mostly in part to the chosen clustering algorithm as well as the evolutionary algorithm which were used to obtain such results. A better combination as well as tuning should yield better results.

As shown in the formed IT2 Fuzzy information granules, some granules captured more uncertainty than others, in many cases the uncertainty is minimal to the point that there is no measurable uncertainty when forming the evaluation of uncertainty Gaussian function.

Having chosen IT2 Fuzzy Gaussian membership functions as representation for higher type information granules, the characteristics of these is that the center value is the same, and only two values for σ form the FOU. Although results are acceptable, other variations can be used to yield different results as well as different interpretations, for example, where the center is offset and two values for σ are used. Even other types of IT2 Fuzzy membership functions could be used, each one having their own interpretation of the information as well as varying results when the IT2 Fuzzy System is formed and optimized.

4 Conclusion and Future Work

4.1 Conclusions

Taking inspiration from the uncertainty-based information theory, higher type information granules can be formed which better conceptualize the uncertainty in the information.

The proposed approach reduces the uncertainty in the information model by measuring the uncertainty by means of the difference between two evaluations of uncertainty created by two distinct measurements of information sampling.

By choosing Interval Type-2 Fuzzy sets as the representation of information granules, the proposed approach directly takes the obtained uncertainty measurement and builds higher type information granules.

Any other form of granule representation which can express the uncertainty in the information can be used [36].

4.2 Future Work

Find the optimal amount of samples for each model building step. Although 40 % was used, what is the minimal amount which can be used to obtain acceptable results?

The amount of samples taken could be explored; this chapter only took two samples to form the final information granule. Could taking more samples yield a better result?

Other information granule representations could be used which also support uncertainty. Even though Type A Gaussian evaluations of uncertainty were used, there are other types of functions which could also directly capture uncertainty.