Keywords

1 Meeting Professor Pawlak First Time: First Problems

It was in the year 1992 and the person who contacted us was Professor Helena Rasiowa, the eminent world–renowned logician. Zdzisław asked me to create a topological theory of rough set spaces: He was eager to introduce into rough sets the classical structures; some logic and algebra already were therein. The finite case was well recognized so I followed an advice by Stan Ulam:‘if you want to discuss a finite case, go first to the infinite one’, I considered information systems with countably infinitely many attributes. Let me sum up the essential results which were warmly welcomed by Zdzisław.

1.1 Rough Set Topology: A Context and Basic Notions

Assume given a setFootnote 1 (a universe) U of objects along with a sequence \(A=\{a_n: n=1, 2, \ldots \}\) of attributes;Footnote 2 without loss of generality, we may assume that \(Ind_n\subseteq Ind_{n+1}\) for each n, where \(Ind_n=\{(u,v): u, v \in U, a_n(u)=a_n(v)\}\). Letting \(Ind=\bigcap _n Ind_n\), we may assume that the family \(\{Ind_n: n=1, 2, \ldots \}\) separates objects, i.e., for each pair \(u\ne v\), there is a class \(P\in U/Ind_n\) for some n such that \(u\in P, v\notin P\), otherwise we would pass to the quotient universe U/Ind. We endow U with some topologies.

1.2 Topologies \(\varPi _n\), the Topology \(\varPi _0\) and Exact and Rough Sets

For each n, the topology \(\varPi _n\) is defined as the partition topology obtained by taking as open sets unions of families of classes of the relation \(Ind_n\). The topology \(\varPi _0\) is the union of topologies \(\varPi _n\) for \(n=1, 2, \ldots \). We apply the topology \(\varPi _0\) to the task of discerning among subsets of the universe U Footnote 3:

$$\begin{aligned} \text {A set}\ Z\subseteq U\ \text { is} \varPi _0\text {-exact if}\ Cl_{\varPi _0}Z= Int_{\varPi _0}Z\ \text {else}\ Z\ \text {is } \varPi _0\text {-rough}. \end{aligned}$$
(1)

1.3 The Space of \(\varPi _0\)-rough Sets is Metrizable

Each \(\varPi _0\)-rough set can be represented as a pair (QT) where \(Q=Cl_{\varPi _0}X,T=U\setminus Int_{\varPi _0}X\) for some \(X\subseteq U\). The pair (QT) has to satisfy the conditions: 1. \(U=Q\cup T\). 2. \(Q\cap T\ne \emptyset \). 3. If \(\{x\}\) is a \(\varPi _0\)-open singleton then \(x\notin Q\cap T\). We define a metric \(d_n\) asFootnote 4

$$\begin{aligned} d_n(u,v)=1\ \text { in case}\ [u]_n\ne [v]_n\ \text {else}\ d_n(u,v)=0. \end{aligned}$$
(2)

and the metric d:

$$\begin{aligned} d(u,v)=\sum _n 10^{-n}\cdot d_n(u,v). \end{aligned}$$
(3)

Theorem 1

Metric topology of d is \(\varPi _0\).

We employ the notion of the Hausdorff metric and apply it to pairs (QT) satisfying 1–3 above, i.e., representing \(\varPi _0\)-rough sets. For pairs \((Q_1,T_1), (Q_2,T_2)\), we let

$$\begin{aligned} D((Q_1,T_1), (Q_2,T_2))=max\{d_H(Q_1,Q_2), d_H(T_1,T_2)\} \end{aligned}$$
(4)

and

$$\begin{aligned} D^*((Q_1,T_1), (Q_2,T_2))=max\{d_H(Q_1,Q_2), d_H(T_1,T_2), d_H(Q_1\cap Q_2, T_1\cap T_2)\}, \end{aligned}$$
(5)

where \(d_H(A,B)=max\{max_{x\in A}dist(x,B), max_{y\in B} dist(y,A)\}\) is the Hausdorff metric on closed setsFootnote 5. The main result is

Theorem 2

If each descending sequence \(\{[u_n]_n:n=1, 2, \ldots \}\) of classes of relations \(Ind_n\) has a non–empty intersection, then each \(D^*\)–fundamental sequence of \(\varPi _0\)–rough sets converges in the metric D to a \(\varPi _0\)–rough set. If, in addition, each relation \(Ind_n\) has a finite number of classes, then the space of \(\varPi _0\)–rough sets is compact in the metric D.

1.4 The Space of Almost \(\varPi _0\)-rough Sets is Metric Complete

In notation of preceding sections, it may happen that a set X is \(\varPi _n\)-rough for each n but it is \(\varPi _0\)-exact. We call such sets almost rough sets. We denote those sets as \(\varPi _{\omega }\)-rough. Each set X of them, is represented in the form of a sequence of pairs \((Q_n,T_n):n=1,2,\ldots \) such that for each n, 1. \(Q_n=Cl_{\varPi _n}X, T_n=U\setminus Int_{\varPi _n}X\). 2. \(Q_n\cap T_n\ne \emptyset \). 3. \(Q_n\cup T_n=U\). 4. \(Q_n\cap T_n\) contains no singleton \(\{x\}\) with \(\{x\}\) \(\varPi _n\)-open. To introduce a metric into the space of \(\varPi _{\omega }\)-rough sets, we apply again the Hausdorff metric but in a modified way: for each n, we let \(d_{H,n}\) to be the Hausdorff metric on \(\varPi _n\)-closed sets, and for representations \((Q_n, T_n)\) and \((Q_n^*,T_n^*)_n\) of \(\varPi _{\omega }\)-rough sets XY, respectively, we define the metric \(D^{\prime }\) as:

$$\begin{aligned} D^{\prime }(X,Y)=\sum _n 10^{-n}\cdot max\{d_{H,n}(Q_n, Q_n^*), d_{H,n}(T_n,T_n^*)\}. \end{aligned}$$
(6)

It turns out that

Theorem 3

The space of \(\varPi _{\omega }\)-rough sets endowed with the metric \(D^{\prime }\) is complete, i.e., each \(D^{\prime }\)-fundamental sequence of \(\varPi _{\omega }\)-rough sets converges to a \(\varPi _{\omega }\)-rough set.

Apart from theoretical value of these results, there was an applicational tint in them.

1.5 Approximate Collage Theorem

Consider an Euclidean space \(E^n\) along with an information system \((E^n,A=\{a_k:k=1, 2, \ldots \})\), each attribute \(a_k\) inducing the partition \(P_k\) of \(E^n\) into cubes of the form \(\prod _{i=1}^n [m_i+\frac{j_i}{2^k}, m_i+\frac{j_i+1}{2^k})\), where \(m_i\) runs over integers and \(j_i\in [0, 2^k-1]\) is an integer. Hence, \(P_{k+1}\subseteq P_k\), each k. We consider fractal objects, i.e., systems of the form \([(C_1, C_2, \ldots , C_p), f, c]\), where each \(C_i\) is a compact set and f is an affine contracting mapping on \(E^n\) with a contraction coefficient \(c\in (0,1)\). The resulting fractal is the limit of the sequence \((F_n)_n\) of compacta, where 1. \(F_0=\bigcup _{i=1}^p C_i\). 2. \(F_{n+1}=f(F_n)\). In this context, fractals are classical examples of \(\varPi _0\)-rough sets. Assume we perceive fractals through their approximations by consecutive grids \(P_k\), so each \(F_n\) is viewed on as its upper approximations \(a_k^+F_n\) for each k Footnote 6. As \(diam(P_k)\rightarrow _{k\rightarrow \infty } 0\), it is evident that the symmetric difference \(F\triangle F_n\) becomes arbitrarily close to the symmetric difference \(a_k^+F\triangle a_k^+F_n\). Hence, in order to approximate F with \(F_n\) it suffices to approximate \(a_k^+F\) with \(a_k^+F_n\). The question poses itself: what is the least k which guarantees for a given \(\varepsilon \), that if \(a_k^+F_n=a_k^+F\) then \(d_H(F,F_n)\le \varepsilon \). We consider the metric D on fractals and their approximations. We had proposed a counterpart to Collage Theorem, by replacing fractals \(F_n\) by their grid approximationsFootnote 7.

Theorem 4

(Approximate Collage Theorem). Assume a fractal F generated by the system \((F_0=\bigcup _{i=1}^p C_i, f,c)\) in the space of \(\varPi _0\)-rough sets with the metric D. In order to satisfy the requirement \(d_H(F,F_n)\le \varepsilon \), it is sufficient to satisfy the requirement \(a_{k_0}^+F_n=a_{k_0}^+F\) with \(k_0=\lceil \frac{1}{2}-log_2\varepsilon \rceil \) and \(n\ge \lceil \frac{log_2 [2^{-k_0+\frac{1}{2}}\cdot K^{-1}\cdot (1-c)]}{log_2c}\rceil \), where \(K=d_H(F_0, F_1)\).

2 Mereology and Rough Mereology

It was a characteristic feature of Professor Pawlak that He had a great interest in theoretical questions. He remembered how He browsed through volumes in the Library at Mathematical Institute of the Polish Academy of Sciences. No doubt that the emergence of rough set theory owes much to those excursions into philosophical writings of Frege, Russell and others. At one time, Zdzisław mentioned some fascicles of the works of Stanisław Leśniewski, the creator of the first formal theory of Mereology. Zdzisław was greatly interested in various formalizations of the idea of a concept and in particular in possible relations between Mereology and Rough Sets. From our analysis of the two theories Rough Mereology emerged.

2.1 Basic Mereology

The primitive notion is here that of a part. The relation of being a part of, denoted prt(uv), is defined on a universe U by requirements: 1. prt(uu) holds for no u. 2. prt(uv) and prt(vw) imply prt(uw): prt(uv) means that u is a proper part of v. To account for improper parts, i.e., wholes the notion of an ingredient, element, ingr for short, was proposed which is \(prt\,\cup \) ‘=’, i.e., ingr(uv) if and only if prt(uv) or \(u=v\). Ingredients are essential in mereological reasoning by the Leśniewski Inference Rule (LIR for short):Footnote 8

LIR: For \(u,v \in U\), if for each w such that ingr(wu), there exist tq such that ingr(tw), ingr(tq), ingr(qv), then ingr(uv).

Ingredients are instrumental in forming individuals–classes of individuals: for each non-void property \({\mathcal {C}}\) of individuals in U, there exists a unique individual, the class of \({\mathcal {C}}\), \(Cls{{\mathcal {C}}}\) in symbols, defined by requirements: 1. If u satisfies \({\mathcal {C}}\) then \(ingr(u, Cls{{\mathcal {C}}})\). 2. For each u with \(ingr(u, Cls{{\mathcal {C}}})\), there exist tq such that ingr(tu), ingr(tq) and q satisfies \({\mathcal {C}}\). Classes are instrumental in our definition of granules. The favorite example of Leśniewski was the chessboard as the class of white and black squares.

2.2 Rough Mereology

The basic notion of a part to a degree is rendered as the relation \(\mu (u,v,r)\subseteq U^2\times [0,1]\), read as ‘u is a part of v to a degree of at least r’ which is defined by requirements: 1. \(\mu (u,v,1)\) if and only if ingr(uv). 2. If \(\mu (u,v,1)\) and \(\mu (w,u, r)\) then \(\mu (w,v,r)\). 3. If \(\mu (u,v,r)\) and \(s<r\) then \(\mu (u,v,s)\). The relation \(\mu \) was termed by us a rough inclusion. Relation of rough mereology to rough set theory becomes clear when we realize that the latter is about concepts and their approximations and that the containment relation is a particular case of the part relation, hence approximations upper and lower are classes of indiscernibility classes which are ingredients or, respectively, parts to a positive degree of a concept. Rough inclusions in information systems are usually defined in the attribute–value format, examples are for instance given by t–norms. It is well–known that Archimedean t-norms, the Łukasiewicz t–norm \(L(x,y)=max\{0, x+y-1\}\) and the Menger (product) t–norm \(P(x,y)=x\cdot y\), allow the representation of the form \(T(x,y)=g(f(x)+f(y))\), where \(f:[0,1]\rightarrow [0,1]\) is a decreasing continuous function with \(f(1)=0\) and g is the pseudo–inverse to f. For an information system \(IS=(U,A)\), the discernibility set Dis(uv) equals \(A\setminus Ind(u,v)\) Footnote 9.

Theorem 5

For an Archimedean t–norm \(T(x,y)=g(f(x)+f(y))\), the relation \(\mu _T(u,v,r)\) if and only if \(g(\frac{card(Dis(u,v))}{card(A)})\ge r\) is a rough inclusion on the universe U.

As an example, we define the Łukasiewicz rough inclusion \(\mu _L\) as \(\mu _L(u,v,r)\) if and only if \(g(\frac{card(Dis(u,v))}{card(A)})\ge r\). As in case of Łukasiewicz rough inclusion, \(g(x)=1-x\), we have \(\mu _L(u,v,r)\) if and only if \(\frac{card(Ind(u,v))}{card(A)}\ge r\): a fuzzified indiscernibility. We recall that each t–norm T defines the residual implication \(\rightarrow _T\) via the equivalence \(x\rightarrow _T y \ge r\) if and only if \(T(x,r)\le y\).

Theorem 6

Let \(\rightarrow _T\) be a residual implication and \(f:U\rightarrow [0,1]\) an embedding of U into the unit interval. Then \(\mu (u,v,r)\) if and only if \(f(u)\rightarrow _T f(v)\ge r\) is a rough inclusion.

We have therefore a collection of rough inclusions to be selected.

3 Rough Mereology in Behavioral Robotics

Autonomous robots are one of the best examples for the notion of an intelligent agent. Problems of their navigation in environments with obstacles are basic in behavioral robotics. We recall here an approach based on rough mereologyFootnote 10.

3.1 Betweenness Relation in Navigating of Teams of Intelligent Agents

Betweenness relation is one of primitive, apart from equidistance, relations adopted by Alfred Tarski in His axiomatization of plane geometry. This relation was generalized by Johan van Bentham in the form of the relation B(xyz), xyz points in an Euclidean space of a finite dimension (it reads: ‘x is between y and z’), with a metric d, in the form:

$$\begin{aligned} B(x,y,z)\ \text {if and only if for each}\ q\ne x:\ d(x,y)<d(q,y)\ \text {or}\ d(x,z)<d(q,z). \end{aligned}$$
(7)

Rough mereology offers a quasi–distance function:

$$\begin{aligned} \kappa (x,y)= min\{sup_r\mu (x,y,r), sup_s\mu (y,x,s)\}. \end{aligned}$$
(8)

We apply in definition of \(\kappa (x,y)\) the rough inclusion \(\mu (a, b, r)\), where ab are bounded measurable sets in the plane,

$$\begin{aligned} \mu (a,b,r)\ \text {if and only if}\ \frac{area(a\cap b)}{area(a)}\ge r. \end{aligned}$$
(9)
Fig. 1.
figure 1

Trails of robots moving in the line formation through the passage.

Consider autonomous robots in the plane as embodiments of intelligent agents. We model robots as rectangles (in fact squares) regularly placed, i.e., with edges parallel to coordinate axes. For such robots denoted a, b, c,... , the betweenness relation can be expressed as follows, see [8]:

Theorem 7

Robot a is between robots b and c, i.e. B(abc) holds true, with respect to betweenness defined in (7), distance defined in (8) and the rough inclusion defined in (9) if and only if \(a\subseteq ext(b,c)\), where ext(bc) is the extent of b and c, i.e., the minimal rectangle containing b and c.

For a team of robots \({{\mathcal {T}}}=\{a_1, a_2, \ldots , a_m\}\), a formation on \({\mathcal {T}}\) is a relation B on \({\mathcal {T}}\). Figure 1 shows a team of robots in Line formation mediating a bottleneck passage after which they return to the Cross formation.

4 Granular Computing

The last of rough mereology applications Zdzisław could be acquainted with is a theory of granular computing presented first at GrC 2005 at Tsinghua University in Beijing, China. Given a rough inclusion \(\mu \) on a universe U of an information system (UA), define a granule \(g_{\mu }(u,r)\) about \(u\in U\) of the radius r as \(g_{\mu }(u,r)=Cls\{v\in U: \mu (v,u,r)\}\). For practical reasons, we compute granules as sets \(\{v\in U: \mu (v,u,r)\}\). The class and the set coincide for many rough inclusions, cf. [5]Footnote 11.

4.1 Granular Classifers: Synthesis via Rough Inclusions

We assume that we are given a decision system \(DS=(U, A, d)\) from which a classifier is to be constructed; on the universe U, a rough inclusion \(\mu \) is given, and a radius \(r\in [0,1]\) is chosen. We can find granules \(g_{\mu }(u,r)\) for all \(u\in U\), and make them into the set \(G(\mu , r)\). From this set, a covering \(Cov(\mu , r)\) of the universe U can be selected by means of a chosen strategy \({\mathcal {G}}\), i.e.,Footnote 12

$$\begin{aligned} Cov(\mu , r) = {{\mathcal {G}}}(G(\mu , r)). \end{aligned}$$
(10)

We intend that \(Cov(\mu , r)\) becomes a new universe of the decision system whose name will be the granular reflection of the original decision system. It remains to define new attributes for this decision system. Each granule g in \(Cov(\mu , r)\) is a collection of objects; attributes in the set \(A\cup \{d\}\) can be factored through the granule g by means of a chosen strategy \({\mathcal {S}}\), usually the majority vote, i.e., for each attribute \(a\in A\cup \{d\}\), the new factored attribute \(\overline{a}\) is defined by means of the formula

$$\begin{aligned} \overline{a}(g)={{\mathcal {S}}}(\{a(v): ingr(v, g_{\mu }(u,r))\}) \end{aligned}$$
(11)

In effect, a new decision system \((Cov(\mu , r), \{\overline{a}:a\in A\},\overline{d})\) is defined. The thingFootnote 13 \(\overline{v_g}\) with the information set \(Inf(\overline{v_g})\) defined asFootnote 14

$$\begin{aligned} Inf(\overline{v_g})=\{(\overline{a}, \overline{a}(g)):a\in A\cup \{d\}\} \end{aligned}$$
(12)

is called the granular reflection of g. We consider a standard data set the Australian Credit Data Set from Repository at UC Irvine and we collect the best results for this data set by various rough set based methods in the table of Fig. 2. In Fig. 3, we give for this data set the results of exhaustive classifier on granular structures: meanings of symbols are r = granule radius, tst = test set size, trn = train set size, rulex = rule number, aex = accuracy, cex = coverageFootnote 15.

Fig. 2.
figure 2

Best results for Australian credit by some rough set based algorithms

We can compare results: for template based methods, the best MI is 0.891, for exhaustive classifier (r = nil) MI is equal to 0.907 and for granular reflections, the best MI value is 0.915 with few other values exceeding 0.900. What seems worthy of a moment’s reflection is the number of rules in the classifier. Whereas for the exhaustive classifier (r = nil) in non–granular case, the number of rules is equal to 5597, in granular case the number of rules can be surprisingly small with a good MI value, e.g., at \(r=0.5\), the number of rules is 293, i.e., 5 percent of the exhaustive classifier size, with the best MI of all of 0.915. This compression of classifier seems to be the most impressive feature of granular classifiers.

Fig. 3.
figure 3

Australian credit granulated

5 Betweenness Revisited in Data Sets

We can use in a given information set \(IS=(U, A)\), the Łukasiewicz rough inclusion \( \mu _L\) in order to obtain the mereological distance \(\kappa \) of (8) and the generalized betweenness relation GB (read: ‘u is between \(v_1, v_2, ..., v_k\))Footnote 16:

$$\begin{aligned} GB(u, v_1, v_2, ..., v_k)\ \text {if for each }v\ne u,\text { there is }v_i\text { such that}\ \kappa (u,v_i)\ge \kappa (u,v). \end{aligned}$$
(13)

One proves cf. [7] that betweenness GB can be expressed as a convex combination:

Theorem 8

\(GB(u, v_1, v_2, ..., v_k)\) if and only if \(Inf(u)=\bigcup _{i=1}^k C_i\), where \(C_i\subseteq Inf(v_i)\) for \(i=1,2, ..., k\) and \(C_i\cap C_j=\emptyset \) for each pair \(i\ne j\).

In order to remove ambiguity in representing u, we introduce the notion of a neighborhood N(u) over a set of neighbors \(\{v_1, v_2, \ldots , v_k\}\) as the structure of the form:

$$\begin{aligned} <(v_1, C_1\subseteq Ind(u,v_1), q(v_1)), \ldots , (v_k, C_k\subseteq Ind(u,v_k), q(v_k))> \end{aligned}$$
(14)

with neighbors \(v_1, v_2, \ldots , v_k\) ordered in the descending order of the factor q, where \(q_i=\frac{card(C_i)}{card(A)}\). Clearly, \(\sum _{i=1}^k q_i=1\) and \(q_i>0\) for each \(i\le k\).

5.1 Dual Indiscernibility Matrix, Kernel and Residuum

Dual indiscernibility matrix DIM, for short, is defined as the matrix \(M(U,A)=[m_{a,v}]\) where \(a\in A, v\) a value of a and \(m_{a,v}=\{u\in U: a(u)=v\}\) for each pair av. The residuum of the information system (UA), Res in symbols, is the set \(\{u\in U\text {:}\,\text {there exists a pair} (a,v)\text { with }\ m_{a,v}=\{u\}\}\). The difference \(U\setminus Res\) is the kernel, Ker in symbols. Clearly, \(U=Ker \cup Res\), \(Ker \cap Res=\emptyset \). The rationale behind those notions is that Ker consists of objects mutually exchangeable so averaged decisions on neighbors should transfer to test objects, while Res consists of objects with outliers which may serve as antennae catching test objects. It is interesting to see how those subsets do in tasks of classification into decision classes. Figure 4 shows results of applying C4.5 and k-NN to whole data set, Ker and Res for a few data sets from UC Irvine Repository. Results are very satisfying in terms of accuracy and size of data sets. Please observe that, for data considered, sets Ker and Res as a rule yield better of results for C4.5 and k-NN on the whole setFootnote 17.

5.2 A Novel Approach: Partial Approximation of Data Objects

The Pair classifier approaches a test object with inductively selected pairs of neighbors of training objects covering it partlyFootnote 18.

Fig. 4.
figure 4

Classification results

Induction is driven by degree of covering from maximal down to the threshold number of steps. Successive pairs are indexed with level L. Objects in pairs up to a given level are pooled and they vote for decision value by majority voting. Figure 5 shows results in comparison to k-NN and Bayes classifiers. The symbol Lx denotes the level of covering, Pair-0 is the simple pair classifier with approximations by the best pair and Pair–best denotes the best result over levels studied.

Fig. 5.
figure 5

Pair classifier

6 Conclusions

The paper presents some results along two threads: along one thread results are highlighted obtained by following Zdzisław Pawlak’s ‘research requests’ and the other thread illustrates results obtained in classical settings by considering new contexts of knowledge engineering created by vision of Zdzisław Pawlak. Further work will focus on rational search for small decision-representative subsets of data with Big Data on mind and rough set based Approximate Ontology in biological and medical data.