1 Introduction

Granular computing (Pedrycz and Chen 2014; Pedrycz 2013; Han and Lin 2010; Bargiela and Pedrycz 2008) is typically portrayed as a research context intended as a convergence of various modeling and computational approaches for dealing with uncertainty. The modeling side is essentially rooted on formal constructs called information granules (IG) (Pedrycz et al. 2015, 2008). Information granules are mathematical models describing data aggregates; data in such aggregates are related to each other by considering, for instance, functional and structural similarity criteria. Well-known formal settings for implementing information granules include (higher-order) fuzzy sets (Wagner et al. 2015), rough sets (Qian et al. 2010; Ali et al. 2013), and intuitionistic sets (Huang et al. 2013), for instance. Information granules can be obtained in a data-driven fashion in different ways (Yao et al. 2013; Salehi et al. 2015). A prominent example comes from partition-based approaches, which are typically implemented by means of partitive clustering algorithms. However, formation of information granules is not limited to partition-based techniques (Salehi et al. 2015; Qian et al. 2014, 2015). Uncertainty is a pivotal concept in granular computing and related aspects. In fact, one of the main goals of information granules is to convey the uncertainty of aggregated data in a synthetic but yet effective way. Uncertainty is a powerful concept, which has been highly exploited in many mathematical settings (Klir 2006). Information theory is certainly the most prominent example, which is rooted in the classical probability framework. However, information-theoretic concepts have been extended to modern models of information granules, such as in the case of imprecise probabilities (Bronevich and Klir 2010), fuzzy sets (Zhai and Mendel 2011), intuitionistic fuzzy sets (Montes et al. 2015), and rough sets (Zhu and Wen 2012; Chen et al. 2014; Dai and Tian 2013).

The computational aspects of granular computing are intimately related with the research context called computational intelligence (Livi et al. 2015). It includes nature-inspired techniques for performing recognition, control, and optimization tasks (Engelbrecht 2007). Related methodologies are typically data-driven, in the sense that such techniques rely on experimental evidence (data, patterns) in order to perform inductive inference (or generalization). In this setting, information granules are used as computational components of data-driven inference systems. Granular neural networks (Ding et al. 2014) are only one of the many pertaining examples available in the literature (Pedrycz 2013).

In this paper, we comment on the importance of data granulation in Computational Intelligence methods. Section 2 introduces the Computational Intelligence context, posing the accent on pattern recognition aspects. We discuss also issues related to the analysis of the so-called non-geometric patterns, which have recently attracted considerable attention by researchers (Livi et al. 2015). Section 3 offers a glimpse of the rapidly-changing granular computing domain. We highlight two specific aspects in this paper: the (novel) interpretation of information granules as patterns (Sect. 3.1) and the challenge of designing a criterion for synthesizing information granules from data (Sect. 3.2). The design of such criteria is deeply connected with the fundamental, conceptual problems underlying the process of data granulation, which drive the quest for a sound theory of granular computing bridging both model-based and data-driven perspectives. Finally, Sect. 4 concludes the paper.

2 Computational intelligence methods and the challenge for processing non-geometric input spaces

The research context called computational intelligence (CI) (Engelbrecht 2007) unifies several nature-inspired computational methods under a data-driven paradigm. Well-known instances of such methods include neural networks, fuzzy systems, evolutionary and swarm intelligence optimization techniques. Typical problems faced by means of CI methods include recognition problems (e.g., classification, clustering, and function approximation) and those of adaptive control (e.g., fuzzy control and data-driven optimization via neural networks). CI is closely-related to the soft computing discipline. Quoting from Bonissone (Bonissone 1997) “The term soft computing (SC) represents the combination of emerging problem-solving technologies such as fuzzy logic (FL), probabilistic reasoning (PR), neural networks (NNs), and genetic algorithms (GAs). Each of these technologies provides us with complementary reasoning and searching methods to solve complex, real-world problems”. Data-driven inductive inference systems can be implemented in terms of soft computing methodologies by departing from the assumption of Boolean truth values and membership of elements to classes. This goal was first obtained by means of the celebrated Zadeh’s fuzzy logic (Zadeh 1965). Different many-valued logics (and corresponding set-theoretic frameworks) have been defined so far, such as rough sets (Pawlak 1982), intuitionistic fuzzy sets (Atanassov 1986), and the three-valued logic underlying shadowed sets (Pedrycz 1998). Well-known applications of fuzzy logic include rule-based (adaptive) fuzzy inference systems (Nauck et al. 1997), modern evolutions of fuzzy neural networks (Wu et al. 2014; Liu an Li 2004) and higher-order fuzzy systems (Zhou et al. 2009; Pagola et al. 2013; Wagner and Hagras 2010; Melin and Castillo 2013; Oh et al. 2014).

Focusing on recognition problems (Theodoridis and Koutroumbas 2008; Haykin 2007), the concept of pattern plays an important role. Patterns are everywhere, such as in climate physics, series of seismic events, complex biochemical and biophysical processes, brain science, financial markets and economical trends, large-scale power systems, and so on. Human knowledge and reasoning are both founded on searching for such patterns and on their effective aggregation to define meaningful concepts and decision rules (Pedrycz 2013). However, formally speaking, a pattern is essentially an experimental instance of a data generating process, P. A process can be described as a mapping \(P: \mathcal {X}\rightarrow \mathcal {Y}\), idealizing a system (either abstract or physical) that generates outputs according to inputs. \(\mathcal {X}\) is referred to as the input space (or domain, representation space), where the patterns are effectively represented according to some suitable formalism. \(\mathcal {Y}\), instead, is the output space. In pattern recognition, the closed-form expression of P is not known. Nonetheless, it is possible to observe such a process through a finite dataset, \(\mathcal {S}\). The problem typically boils down to reconstructing a mathematical model of P, say M, by analyzing \(\mathcal {S}\). To be useful in practice, a mathematical model M, once established, must be adapted to the problem at hand. In practice, learning or synthesizing a model M from \(\mathcal {S}\) consists in optimizing some criterion, i.e., a performance measure that allows to tune the model parameters to the data instance at hand. Such a model is then evaluated (used) by considering its generalization capability on unseen test patterns. Pattern recognition techniques can be grouped in two mainstream approaches: discriminative, such as support vector machines (Schölkopf and Smola 2002) and adaptive fuzzy inference systems (Sadeghian and Lavers 2011), and generative, like the hidden Markov models (Bicego et al. 2004) and the recently-developed deep convolutional neural networks (Sainath et al. 2014). Of course, also hybridized formulations exist.

Computational intelligence methods are typically designed relying on the assumption that the input space, \(\mathcal {X}\), is essentially a subset of \(\mathbb {R}^d\). When departing from the \(\mathbb {R}^d\) pattern representation, theoretical and practical problems arise, which are mostly due to the absence of an intuitive geometric interpretation of the data. However, many important applications can be tackled by representing patterns as “non-geometric” entities. For instance, it is possible to cite applications in document analysis (Bunke and Riesen 2011), solubility of E. coli proteome (Livi et al. 2015), bio-molecules recognition (Ceroni et al. 2007; Rupp and Schneider 2010), chemical structures generation (White and Wilson 2010), image analysis (Serratosa et al. 2013; Morales-González et al. 2014), and scene understanding (Brun et al. 2014). The availability of interesting datasets containing non-geometric data motivated the development of pattern recognition and soft computing techniques on such domains (Livi et al. 2014, 2015; Rossi et al. 2015; Fischer et al. 2015; Lange et al. 2015; Schleif 2014; Bianchi et al. 2015). Non-geometric patterns include data which are characterized by pairwise dissimilarities that are not metric; therefore they cannot be straightforwardly represented in a Euclidean space (Pȩkalska and Duin 2005). A particularly interesting instance of such non-geometric data is constituted by structured patterns. A labeled graph is the most general structured pattern that is conceivable, since it allows to characterize a pattern by describing the topological structure of its constituting elements (the vertices) through their relations (the edges) (Livi and Rizzi 2013). Both vertices and edges can be equipped with suitable labels, i.e., the specific attributes characterizing the elements and their relations. Sequences of generic objects, trees, and automata, for instance, can be though as particular instances of labeled graphs.

Figure 1 offers a schematic, visual representation of the typical stages involved in the use of CI methods for recognition purposes. By means of a training set (Tr in the figures), a model is synthesized and then used during the so-called testing stage by processing unseen data (denoted as Ts). Such a schematic organization is valid for both \(\mathbb {R}^d\) (Fig. 1a) and non-geometric (Fig. 1b) data as well. However, in order to properly use standard CI methods in the case of non-geometric data, the input space must be processed with particular care. Notably, three mainstream approaches can be pursued (Livi et al. 2015): (i) using a suitable dissimilarity measure operating in the input space, (ii) using positive definite (PD) kernel functions, and (iii) embedding the input space in \(\mathbb {R}^n\). The choice depends on the particular data-driven system adopted for the task at hand and on other factors, such as the computational complexity and the specific application setting. The first case is the most straightforward one, but its use is legitimate only if the data-driven system does not require a specific geometrical structure of the input space. In fact, a dissimilarity measure might not be metric; therefore also not Euclidean. The second case is a typical choice in the case of kernel methods—such as support vector machines. Positive definite kernels can be obtained only if the underlying geometry is Euclidean. However, if such a requirement does not hold, corrections techniques could be used to rectify the data (Livi et al. 2015). The last approach consists in mapping (i.e., embedding) the input data into a vector space, typically \(\mathbb {R}^n\). In this way, conventional CI methods can be used without alterations.

Fig. 1
figure 1

Schematic representations of a data-driven inference system operating in \(\mathbb {R}^d\) (a) and considering non-geometric (b) data, respectively

3 Granular computing as a general data analysis framework

Granular computing (GrC) can be pictured as a general data analysis framework (or a data analysis paradigm) founded on IGs. Information granulation is a the basis of GrC: from the formation of sound IGs to their use in intelligent systems. IGs are the main mathematical constructs involved in the process of GrC. Several formalisms are available to implement IGs, such as fuzzy and rough sets (Yao et al. 2013). Such mathematical frameworks offer solid bases on which designing IG models and their operations. However, an issue arises when we project our perspectives in a data-driven context, that is, when we try to extract (or synthesize) IGs from empirically observed data. A criterion to synthesize IGs from data is thus of utmost importance, since (i) it provides a way to formalize ideas under a common umbrella facilitating their practical and formal evaluations, and (ii) it lies at the basis of any consistent, formal theory. We suggest that a criterion for synthesizing IGs should play the same role as the one played by error functions in learning from data. A sound criterion for synthesizing IGs would lay the groundwork for conceiving a formal theory of GrC. Nonetheless, to date a general, unified, and consolidated theory of GrC, bridging model-based and data-driven perspectives in information granulation, is currently missing.

The integration of IG constructs and CI systems, such as pattern recognition and control systems, is nowadays well-established. For instance, granular neural networks offer an interesting example (Ding et al. 2014; Zhang et al. 2008; Ganivada et al. 2011; Song and Pedrycz 2013). Granular neural networks are basically extensions of typical artificial neural network architectures, which incorporate a mechanism of information granulation at the weights level or within the neuron model. In the first case, numerical weights modeling the synaptic connections of the network are realized in term of IGs—e.g., interval, fuzzy and rough sets. A particularly interesting consequence of this design choice is the fact that a granular neural network typically produces a granular output, hence consistent with the framework chosen for the IGs. Another interesting application of IGs can be cited in higher-order fuzzy inference systems (Biglarbegian et al. 2010; Gaxiola et al. 2014; Soto et al. 2014; Mendel 2014). Fuzzy inference systems and their extensions played a pivotal role in many applications in the last few decades. Higher-order fuzzy inference systems employ fuzzy sets of higher type instead of the original (type-1) fuzzy sets for handling the uncertainty of the input–output mapping and for performing the inference (e.g., rule composition). Clustering is another important research endeavor in which the GrC paradigm plays an important role (Tang and Zhu 2013; Linda and Manic 2012; Izakian et al. 2015). In fact, clustering algorithms are one of the most prominent example of techniques to generate IGs—technically, via the generation of a partition of the input data. Here IGs find a one-to-one mapping with clusters, which are typically endowed with some mathematical construct in order to offer a synthetic description of the data together with its characteristic uncertainty. IG constructs (mostly fuzzy sets) have been used also in problems of optimization and decision-making (Liang and Liao 2007; Kahraman et al. 2006; Pedrycz 2014; Wang et al. 2014a, b; Pedrycz and Bargiela 2012). In fact, both problems are typically affected by uncertainty at different levels: in the problem definition (e.g., constraints) or by considering the output (e.g., decision variables). It is worth citing the use of higher-order fuzzy constructs also in time series analysis (Chen and Tanuwijaya 2011; Huarng and Yu 2005), where either the time and amplitude (e.g., the time series realizations) domains are subjected to proper granulations. Finally, rough set theory found considerable application in many data analysis contexts. The rough set construct can be used to identify a reduced version of the original set of attributes pertaining to a decision system. As a consequence, rough sets found considerable application in feature selection and classification systems (Thangavel and Pethalakshmi 2009; Swiniarski and Skowron 2003; Foithong et al. 2012).

A founding prerequisite that an IG should satisfy is the capability of handling the uncertainty of the low-level entities that it aggregates. Klir (1995) states that “The nature of uncertainty depends on the mathematical theory within which problem situations are formalized”. This fact suggests that the mathematical description of the data uncertainty pertaining a specific situation/process is not absolute, although it should be possible a reasonable and consistent mapping among the various theories. As a consequence, the specific setting on which IGs are defined affects in turn the way the data uncertainty is handled and therefore used in practice by an intelligent system operating through data granulation. Nonetheless, as postulated in Refs. (Livi and Sadeghian 2015; Livi and Rizzi 2015), the level of uncertainty conveyed by an IG defined according to some mathematical setting should be monotonically related to the uncertainty expressed by some other IG defined in a different setting. This suggests that, given some experimental evidence, the level of uncertainty is what should be preserved during data granulation, regardless of the formal setting used for defining IGs.

3.1 Information granules as data patterns

From a mathematical viewpoint, IGs are considered as formal constructs endowed with proper operations, such as intersection and union, to allow for symbolic manipulations. From a more operative perspective, instead, IGs typically play the role of computational components in some data-driven inference mechanism; as discussed in the previous sections. Nonetheless, more recently researchers (Ha et al. 2013; Guevara et al. 2014; Livi et al. 2013, 2014; Rizzi et al. 2013) realized that IGs could be considered also as a particular type of (non-geometric) patterns. This perspective opens the way to a multitude of future research works. For instance, it could be interesting to face typical pattern recognition problems, such as clustering, classification, and function approximation, directly in the space of IGs. Technical issues involved in this process could be faced by exploiting the methods already developed for the non-geometric domains introduced in Sect. 2. Here, similarity measure for IGs (e.g., higher-order fuzzy sets, intuitionistic sets, and rough sets (Zhao et al. 2014; Chen and Chang 2015; Tahayori et al. 2015) could offer an important technical bridge between those fields. The process of data granulation implements an abstraction of the original data. This suggests that facing a data-driven problem in the space of IGs would required a different interpretation, offering thus also qualitatively different insights for the problem at hand.

3.2 General criteria for a justifiable data granulation

The quest for a general, sound, and justifiable criterion by which synthesizing IG from empirical evidence plays a pivotal role in GrC. IGs, in the data-driven setting, are obtained by means of an algorithmic procedure operating on some (typically, but not necessarily, non-granulated) input dataset. As previously stated, there are many mathematical models suitable for modeling IGs, like hyperboxes, (higher-order) fuzzy sets, shadowed sets, rough sets, and hybrid models (Pedrycz et al. 2008). All those models are framed in specific theories, having well-defined mathematical operations and descriptive measures. However, when facing (data-driven) problems involving the synthesis of IGs from a given dataset (experimental evidence), a sound and general criterion must be adopted. Such a criterion should be general, in the sense that it should work regardless of the specific IG model that is adopted. In fact, a general theory of GrC should not be conceived by focusing on specific mathematical formalisms for IGs. In addition, the criterion should be mathematically sound, meaning that it should admit a well-defined mathematical formulation, allowing thus for rigorous implementations, extensions, and validations. According to the perspectives offered in Sect. 2, we would be tempted to suggest that such a criterion should be applicable also regardless of the nature of the input data domain (e.g., numeric or not). In our opinion, such a criterion would provide the basic component for aiming toward a formal and unified theory of GrC, bridging both model-based and data-driven perspectives.

Despite the considerable effort recently devoted to the design of granulation procedures (algorithms for generating IGs) (Yao et al. 2013; Salehi et al. 2015) and formal GrC frameworks (Qian et al. 2014, 2015), to our knowledge it is possible to cite only two instances of such a criterion: the Principle of Justifiable Granularity (PJG) (Pedrycz and Homenda 2013) and the Principle of Uncertainty Level Preservation (PULP) (Livi and Sadeghian 2015).

3.2.1 The principle of justifiable granularity

The PJG (Pedrycz and Homenda 2013; Pedrycz 2011) has been developed as a guideline to form IGs from the available (experimental) input data. IGs generated following such a principle have to comply with two conflicting requirements: (i) justifiability and (ii) specificity. The first requirement insures that each IG would cover a sufficient portion of the experimental evidence. This means that a well-formed IG should not be too specialized. On the other hand, the second requirement provides a way to generate IGs that are not too dispersive, in the sense that an IG should come also with a well-defined semantics. Such two requirements taken together allow for a data-driven, user-centered, and problem-dependent synthesis of IGs from specific input datasets. The PJG itself is general—it has been successfully used to generate different IG types, including fuzzy sets and shadowed sets—and mathematically sound—it usually takes the form of an optimization problem. However, being designed in a user-oriented perspective, it is not conceived to directly offer a built-in mechanism to objectively evaluate the “quality” of the granulation itself. To this end, it is necessary to rely on external performance measures to quantify and judge over the quality of an IG.

3.2.2 The principle of uncertainty level preservation

Principle of uncertainty level preservation (Livi and Rizzi 2015; Livi and Sadeghian 2015) elaborates on a different perspective, by considering data granulation as a mapping between some input and output domain. PULP takes inspiration from the principles of uncertainty (Klir 1995), formulated by Klir few decades ago. A quantification of the uncertainty is considered in PULP as an invariant property to be preserved during the process of data granulation, i.e., when assigning an IG to a given input dataset. Uncertainty in PULP assumes the form of entropy expressions, exploiting the fact that the concept of entropy is well-developed in many mathematical settings, such as those of probability and fuzzy set theory. Therefore, synthesis of IGs is effectively casted in an information-theoretic framework, where the entropy measured for the input evidence is used as a guideline to form output IGs. Notably, the difference among the input–output entropy is considered as the granulation error, which needs to be minimized in order to reach a satisfactory result. PULP allows for a nonlinear relationship among the input–output entropy by considering a suitable monotonically increasing function, bridging the two frameworks that define the input and output uncertainty quantifications, respectively.

Moving to mesoscopic descriptors, such as those provided by entropic characterizations, allows to conceived data granulation on a more abstract perspective. This fact has a number of benefits: (i) it automatically provides a way to quantify the quality of the granulation itself by objectively relying on the input–output uncertainty difference; (ii) it is applicable to any form of input data and IG formalism (at least to those where entropic functionals can be developed), and (iii) it allows to judge over the performance of different data granulation procedures operating in the same experimental conditions.

4 Concluding remarks

We would like to conclude this paper by posing a question: is granular computing an intrinsically experimental discipline? In other terms, is it possible to conceive an axiomatic theory of granular computing on which consistently develop both theoretical results and algorithmic solutions to perform data granulation and related operations? Of course, in the affirmative case, such a theory should be general, without focusing on specific models of information granules. A very important issue, as discussed in this paper, would be a criterion to bridge the model-based and data-driven perspectives of information granulation. That is to say, how should we perform data granulation? By following what criterion? We suggest that such a criterion would play an important role in the quest for a unified and sound theory of granular computing.