Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

We address here the problem of discovering patterns and associated knowledge in an attributed graph. Previous work focuses on the topological structure of the patterns, thus ignoring the vertex properties, or consider only local or semi-local patterns [4]. In [1] patterns on co-variations between vertex attributes are investigated in which topological attributes are added to the original vertex attributes and in [7] the authors investigate the correlation between the support set of an itemset and the occurrence of dense subgraphs. What we propose in this article is to consider a graph \(G=(O,E)\) whose vertices are labelled by itemsets and to submit their occurrences in the vertex set O, i.e. their support sets, to connectivity constraints. We consider attribute patterns in the standard closed itemset mining approach developed in Formal concept Analysis (FCA)[3], Galois Analysis [2], and Data Mining (see for instance [6]).

In pattern mining, a support-closed pattern is a pattern which is maximal, in size, i.e. in terms of specificity, within the equivalence class of all patterns q sharing the same support set \(e=\mathrm {ext}(q)\). The corresponding equivalence relation is simply denoted \(\equiv \). In standard itemset ming, there is a unique support-closed pattern, i.e. a maximum, in each equivalence class and this support-closed pattern is easily computed using a closure operator f. More precisely, when considering some pattern q its equivalence class is made of all patterns whose support set is \(\mathrm {ext}(q)\) and the unique support-closed pattern is obtained as \(f(q)= \mathrm {int}\circ \mathrm {ext}(q)\) where \(\mathrm {int}\) simply intersect the object descriptions of the support set. Support-closed patterns are then simply called closed patterns. The set of (support set, closed pattern) pairs is organized within a concept lattice and inclusion of support sets leads to implication rules that hold on the dataset under investigation. The set of frequent closed patterns, i.e. closed elements whose support is greater than or equal to some threshold \(\mathrm {minsupp}\), represents then all the equivalence classes corresponding to frequent supports. Such a class has also minimal elements, called generators. When the patterns belong to \(2^X\), the min-max basis of implication rules [6] that represents all the implications \(t \rightarrow t'\) that hold on O, i.e. such that \(\mathrm {ext}(t) \subseteq \mathrm {ext}(t')\), is defined as follows:

\(\{ g\rightarrow f \ \mid f \text{ is } \text{ a } \text{ closed } \text{ pattern }, g \text{ is } \text{ a } \text{ generator }, f \not = g, \mathrm {ext}(g)=\mathrm {ext}(f)\} \)

2 Abstract Knowledge

In a previous work [9] the attributed graph \(G=(O,E)\) was investigated in the following way: each pattern support set \(e\subseteq O\), as a set of vertices, induces a subgraph G(e) of G, and this subgraph is then simplified by removing vertices in various ways. The vertices of such an abstract subgraph all satisfy some topological constraint, as for instance belonging to a k-clique, and form the abstract support set of the pattern. What happens here is that the extensional space is then reduced to a part A of \(2^O\), called a graph abstraction, and that can be generated as the union closure of subsets of O we call abstract groups. For instance the k-clique abstraction is made of union of k-cliques and therefore the abstract support set of a pattern is the (maximum) subset of its support set made of k-cliques (Fig. 1).

Fig. 1.
figure 1

An attributed graph

Example 1

Consider the graph \(G=(O,E)\) where \(O=\{1,2,3,4,5,6,7,8\}\) and \(E=\{12,13,23, 34,45,56, 67,57,68,78\}\). Each vertex o is described by \(d(o) \in 2^{abc}\), i.e. \(d(1)=d(2)=d(3)=ab,d(4)=d(5)=ac,d(6)=d(8)=bc,d(7)=abc\). Consider then the 3-clique abstraction A. The support set of a is \(\mathrm {ext}(a)=\{1,2,3,4,5,7\}\) and induces the subgraph G(e) whose edges are \(\{12,23,13,34,45,57\}\). Its abstract support set is \(\mathrm {ext}_A(a)=\{1,2,3\}\) as no vertex amongst 4, 5, 7 belongs to a triangle in G(e).

Abstract support sets are obtained applying an interior operator p such that \(p[2^O]=A\), i.e. \(\mathrm {ext}_A = p\circ \mathrm {ext}\). As an interior operator on \(2^O\), p has the following properties: for any \(e,e' \in 2^O\), i) \(p(e) \le e\), ii) \(p(p(e))=p(e)\) and iii) \(e \le e' \Rightarrow p(e) \le p(e')\). Abstract implications are then defined by considering inclusion of abstract support sets, i.e. \( \Box _A q \rightarrow \Box _A w\) is valid if and only if \(\mathrm {ext}_A(q) \subseteq \mathrm {ext}_A(w)\). Such an abstract rule has the following meaning “whenever the members of some abstract group share pattern q, they also share pattern w”. Because of the monotony (condition iii)) of the interior operator p, abstraction preserves implication validity:

Lemma 1

Let A be an abstraction, q and w two patterns, then \(q \rightarrow w \Rightarrow \Box _A q \rightarrow \Box _A w\)

In the case of the k-clique abstraction mentioned above, this means that by restricting the support sets of patterns to be made of k-cliques, we preserve previous valid implications and possibly obtain some new valid abstract implications representing abstract knowledge.

Consider then the equivalence relation \(\equiv _A\) defined by \(q \equiv _A w\) iff \(ext_A(q)=ext_A(w)\). Equivalence classes of \(\equiv _A\) have a maximum obtained, by applying the closure operator \(\mathrm {int} \circ p \circ \mathrm {ext}\) and called an abstract closed pattern, while its minimal elements are called A-generators. We then obtain the abstract min-max basis of abstract implications rules where \(\mathrm {ext}_A\) replaces \(\mathrm {ext}\). The abstract min-max basis is made of abstract implications relating A-generators of some equivalence class of \(\equiv _A\) to the abstract closed pattern of the same class:

\(\{ \Box _A g\rightarrow \Box _A c \mid c \text{ is } \text{ an } \text{ A-closed } \text{ pattern }, g \text{ is } \text{ a } \text{ A-generator }, c \not = g, \mathrm {ext}_A(g)=\mathrm {ext}_A(c)\} \)

Example 2

Consider the data and 3-clique abstraction of Example 1. Intersecting the vertex descriptions of \(\mathrm {ext}_A(a)=\{1,2,3\}\) we obtain the abstract closed pattern ab. The equivalence class of patterns having abstract support set \(\{1,2,3\}\) is \(\{a,ab\}\) and a is therefore a A-generator. This means that \(\Box a \rightarrow \Box ab\) belongs to the abstract min-max basis extracted from G and means “whenever the vertices of a triangle in G share pattern a, they also share pattern ab”. Note that \(a \rightarrow b\) was not a valide rule, i.e. when considering some vertex o to infer b from a we have to consider some triangle to which o belongs and whose two other vertices also have a.

3 Measuring Abstract Knowledge

When considering frequent abstract closed patterns, we are interested in ordering or selecting them according to to what extent they are related to the graph structure. For that purpose we generalize hereunder the structural correlation measure introduced by A. Silva and co-authors [7], originally introduced to compute the ratio of vertices involved in quasi-cliques in the subgraph induced by a pattern, and rename it as specificity.

Definition 1

Let q be a pattern, A an abstraction of some powerset of objects O, the specificity of q with respect to A is defined as:

$$s_A(q)= \frac{\mid \mathrm {ext}_A(q) \mid }{ \mid \mathrm {ext}(q) \mid }$$

Apart from measuring through specificity what is specific to the pattern in its abstract view, we are also interested when considering abstract rules in how informative they are. For that purpose we consider abstract rules whose left and right patterns are equivalent in the abstract space A, i.e. have same abstract support set, as in the min-max abstract rule basis defined above. Whenever these patterns are also equivalent in the original space \(2^O\) intuitively the rule is uninformative. Assume for instance that both \(a \rightarrow abc\) and \(\Box _A a \rightarrow \Box _A abc\) are valid, then the abstract rule did not bring any new information. On the contrary, assume that \(\Box _A a \rightarrow \Box _A abc\) is valid while \(a \rightarrow abc\) has only confidence 0.5, i.e. \(\mathrm {ext}(abc) = 0.5 * \mathrm {ext}(a)\), then clearly the abstract rule brings some information. We simply measure here informativity as the inverse of confidence.

Definition 2

Let q be a pattern, A an abstraction of \(2^O\), the informativity of the valid rule \(r: \Box _A q \rightarrow \Box _A w\) is defined as:

$$I_A(r)= \frac{\mid \mathrm {ext}(q) \mid }{ \mid \mathrm {ext}(q w) \mid }$$

An alternative Informativity measure, ranging between 0 and 1, would be the (estimated) probability of not having w whenever we have q i.e. \(1 - \frac{\mid \mathrm {ext}(qw) \mid }{ \mid \mathrm {ext}(q ) \mid }\). This quantity has value 0 whenever \(q \rightarrow w\) holds and has limit 1 whenever \(\mid \mathrm {ext}(qw) \mid \) approaches 0, i.e. restricting the support set of patterns to elements of A concentrates the support set of q to the very few sharing also w. In the remaining of the article we keep Definition 2 to define informativity.

Considering an implication rule from the abstract min-max basis \(\Box _A g\rightarrow \Box _A c \), we are then interested in the specificity \(s_A(c)\) of the abstract closed pattern and in the informativity \(I_A(r)= \frac{\mid \mathrm {ext}(g) \mid }{ \mid \mathrm {ext}(c) \mid }\) of the rule.

Example 3

Considering the attributed graph and triangle abstraction of Examples 1 and 2, ab has specificity \(3 \div 6=0.5\) while \(\Box _A a \rightarrow \Box _A ab\) has informativity \(6\div 4=1.5\).

4 Local Knowledge

Given some attribute pattern, we are now interested in extracting local support closed patterns, i.e. maximal attribute patterns each associated to one dense subgraph, so allowing to extract local implication rules particular to specific dense groups of objects. Recently the closed pattern mining methodology has been extended to local closed patterns: they are obtained by applying a set of local closure operators [8]. In the graph case, this means that from the support set of some (closed) pattern c, various dense support sets, called local support sets are extracted each associated to a local closed pattern, i.e. the most specific pattern l common to the elements of the local support set. Again we obtain a set of local implication rules corresponding to inclusion of local support sets, but now such an implication is only valid in the vicinity of some dense group of vertices.

4.1 Direct Local Knowledge

The simplest case appears when the extensional space is reduced to the set F of connected subgraphs induced by vertex subsets belonging to some graph abstraction A. To a pattern q is associated one of its connected component e as a local support set, and \(\mathrm {int}(e)\) as the corresponding local closed pattern. We may then consider, for instance, as A the 3-clique abstraction and obtain as local support sets connected subgraphs made of 3-cliques. In this simple case, F is a confluence of A [8], i.e. a partially ordered set made of several lattices, and that has in general a set min(F) of minimal elements. More precisely, in our connected 3-clique subgraphs case, these minimal elements are the 3-cliques of our graph G. We call such a confluence, whose elements are connected components, a cc-confluence. Let q be a pattern, \( m\in \mathrm {min}(F)\), and \(m \subseteq \mathrm {ext}_A(q)\), we obtain the connected component containing the 3-clique m as \(\mathrm {ext}^A_m= p_m \circ \mathrm {ext}_A(q)\) where \(p_m\) is again an interior operator, and therefore is monotonic. Note that in a cc-confluence, each vertex appears in only one such connected components and we may as well replace m by one of its vertex s in our definitions.

Whenever we have \(p_m \circ \mathrm {ext}_A(q) \subseteq p_m \circ \mathrm {ext}_A(w) \) we rewrite this as the local implication \(\Box ^A_m q \rightarrow \Box ^A_m w\) stating that if q has a local support set containing m, then w has a larger than or equal to local support set. Because of monoticity of \(p_m\), again validity of implications is preserved:

Lemma 2

Let F be a confluence of an abstraction A, q and w two patterns, then \(\Box _A q \rightarrow \Box _A w \Rightarrow \Box ^A_m q \rightarrow \Box ^A_m w\)

When considering a given abstract closed pattern c which has a local support set e in F that contains m, and whose corresponding local closed pattern is l, we have then that the implication rule \(\Box ^A_m c \rightarrow \Box ^A_m l\) holds. The set \(\{ \Box _A c\rightarrow \Box _A l \mid l \text{ a } \text{ local } \text{ closed } \text{ pattern }, c \text { an abstract closed pattern}, c\not = l, \mathrm {ext}^A_m(c)=\mathrm {ext}^A_m(l)\} \) represents (a basis for) the local knowledge deriving from the reduction of the extensional space from A to the confluence F.

Example 4

Still considering the attributed graph G and triangle abstraction of Examples 1 and 2, we consider the cc-confluence F of vertex subsets inducing connected subgraphs of G made of triangles. We have \(\mathrm {ext}_A(b)=\{1,2,3,6,7,8\}\) that induces a subgraph made of two connected components \(\{1,2,3\}\) and \(\{6,7,8\}\). The corresponding local closed patterns are \(\mathrm {int}({\{1,2,3\}})=ab\) and \(\mathrm {int}({\{6,7,8\}})=bc\). As \(b=\mathrm {int}(\{1,2,3,6,7,8\}\), b is an abstract closed pattern and we have the following local implications: \( \Box _A^{\{1,2,3\}}b \rightarrow \Box _A^{\{1,2,3\}} ab\) and \( \Box _A^{\{6,7,8\}} b \rightarrow \Box _A^{\{6,7,8\}} bc\) we may rewrite, since A is a cc-confluence, as, for instance: \( \Box _A^{1} b \rightarrow \Box _A^{1} ab\) and \( \Box _A^{6} b \rightarrow \Box _A^{6} bc\).

4.2 Measuring Direct Local Knowledge

To measure how much a local closed pattern is specific to the associated connected component, and in the same way as in the abstract case where we considered the ratio between the abstract and standard support sets, we are here interested in the ratio between the local and the global (standard or abstract) support set:

Definition 3

Let q be a pattern, F an extensional confluence of some abstraction A of \(2^O\), and \(m\in F\) such that \(m \subseteq \mathrm {ext}_A(q)\), the specificity of q in the vicinity of m is defined as:

$$s_F(q,m)= \frac{\mid \mathrm {ext}^A_m(q) \mid }{ \mid \mathrm {ext}_A(q) \mid }$$

In the same way as in the abstract implication case, we measure informativity of a local rule with respect to the corresponding global rule. The idea here is that in a valid local implication the patterns left and (left+)right have same local support set while their global support sets are different. Again informativity is defined as the inverse of the (abstract) confidence.

Definition 4

Let q be a pattern, F an extensional confluence of some abstraction A of \(2^O\), and \(m\in F\) such that \(m \subseteq \mathrm {ext}_A(q)\), the informativity of the valid local rule \(r: \Box ^A_m q \rightarrow \Box ^A_m w\) is defined as:

$$I_F(r)= \frac{\mid \mathrm {ext}_A(q) \mid }{ \mid \mathrm {ext}_A(q w) \mid }$$

Intuitively, informativity measures what we have learned when discovering that q and qw had same local support sets with respect to m while they had different abstract support set. Considering a local implication rule \(r: \Box ^A_m c \rightarrow \Box ^A_m l\) we are interested in the specifcity \(s_F(l,m)\) of the local closed pattern l and in the informativity \(I_F(r)= \frac{\mid \mathrm {ext}_A(c) \mid }{ \mid \mathrm {ext}_A(l \mid }\) of the rule.

Example 5

Always following Examples 1,2,3 and 4, we obtain bc local specificity w.r.t. triangle \(\{6,7,8\}\), \(s_F(bc, \{6,7,8\}) = 3\div 3=1 \), i.e. pattern bc is specific of the local support set \(\{1,2,3\}\). Furthermore, implication \( \Box _A^{\{6,7,8\}}b \rightarrow \Box _A^{\{6,7,8\}} bc\) has informativity \(6\div 3=2\), i.e. in the abstract extensional space A \( \Box _A b \rightarrow \Box _A bc\) has confidence 0.5 while the implication holds at the local level.

4.3 Indirect Local Knowledge and Associated Measures

Local knowledge is related above to a notion of locality in a graph expressed through a confluence structure of the vertex space. This is mainly illustrated on the idea that the subgraph induced by the (abstract) support set of some pattern is made of several connected components, and that there may be specific patterns associated to each connected component. However, we are also interested in locality notions closer to the notion of community in Social Network Analysis. A well known example of community definition is the k-clique community [5] which is defined as a maximal vertex subset made of adjacent (i.e. sharing \(k-1\) vertices) k-cliques. Such a k-clique community may alternatively be defined as a connected component of a graph whose vertices are k-cliques and edges relate two adjacent k-cliques. What we discuss, more generally, in this section is a way to define local knowledge associated to subgraphs which are connected components of a derived graph made of particular vertex subsets, as k-cliques in the k-clique community case. This local knowledge, stated as indirect, is obtained by using the methodology described in Sect. 4 on the derived graph.

We start from a family T of elements of \(2^O\), and consider T as the vertex set of a new graph \(G_T=(T,E_T)\). We consider then a confluence F of \(2^T\) as the extensional space and search for the corresponding local closed patterns. The corresponding local support sets are afterwards transformed into support sets in \(2^O\): when considering a (local support set, local closed pattern) pair \((e_T,l)\) we may transform it into the pair (el) where e is the union of the elements of \(e_T\). Let \(T \subseteq 2^O\), and \(u: 2^T \rightarrow 2^O \) be such that \(u(e_T)= \cup _{t \in e_T}t\). \(u(e_T)\) is called the flattening of \(e_T\). We consider then two maps \(\mathrm {ext}_T\) and \(\mathrm {int_T}\) relating L to \(2^T\):

  • \(\mathrm {ext}_T: L \rightarrow 2^T\) with \(\mathrm {ext}_T(q) = \{ t | t \subseteq \mathrm {ext}(q)\}\)

  • \(\mathrm {int}_T: 2^T \rightarrow L\) with \(\mathrm {int}_T(e_T) = \mathrm {int}\circ u(e_T)\)

\(\mathrm {ext}_T(q) \) represents the support set of q in \(2^T\) when considering that q occurs in t whenever q occurs in all elements of t. Conversely \(\mathrm {int}_T(e_T)\) represents the greatest pattern in L whose support set in T includes \(e_T\), i.e. whose support set in O contains, as subsets, the elements of \(e_T\). We have then the following result when flattening the (local ) support sets so found in F:

Proposition 1

Let F be a confluence of \(2^T\), u be the flattening operator on O and \((e_T,l)\) be a (local support set, local closed pattern) pair with \(e_T \ge m \in \mathrm {min}[F]\), then \( u(e_T)\) is the greatest element of \(u[F^m]\) among elements e such that \(\mathrm {int}(e)=l\).

This means that the support closed patterns with respect to the confluence F are the same as the support closed patterns with respect to the extensional space \(U=u[F]\). Note that as flattened support sets are obtained by joining elements of T, they belong to the abstraction \(A=\mathrm {UnionClosure}(T)\).Footnote 1

This will be illustrated by considering T as the set of 3-cliques of G (further called triangles) and stating that \((t_1,t_2)\) belongs to \(G_T\) whenever \(t_1\) and \(t_2\) share an edge in G. In this case, a flattened local support set of pattern q represents a triangle community in the pattern q subgraph \(G(\mathrm {ext}(q))\). An example of both graphs G and \(G_T\) is displayed Fig. 2.

Fig. 2.
figure 2

On the left we have a graph of objects each described as an itemset included in \(\{a,b,c\}\). This graph represents the triangle abstraction of some input graph. On the right, the graph \(G_T\) whose vertices are the triangles of G. The itemset describing a vertex in \(G_T\) is the intersection of the itemsets describing the elements of the corresponding triangle in G.

It is then natural to extend the definition of specifity to make it relative to the flattened support sets:

Definition 5

Let q be a pattern, F an extensional confluence of \(2^T\) where \(T \subseteq 2^O\), A is the abstraction generated from T and \(m\in F\) such that \(m \subseteq \mathrm {ext}_T(q)\), the flattened specificity of q in the vicinity of m is defined as:

$$s_F^f(q,m)= \frac{\mid u \circ p_m \circ \mathrm {ext}_T(q) \mid }{ \mid u\circ \mathrm {ext}_T(q) \mid } = \frac{\mid u \circ p_m \circ \mathrm {ext}_T(q) \mid }{ \mid \mathrm {ext}_A(q) \mid }$$

Coming back to the example of triangles communities, \(s_F(q,m)\) states to what extent a pattern q is specific to the community containing a particular triangle m with respect to its abstract support set in O when considering only triangles.

From Sect. 4.1 we know that we may rewrite \(p_m \circ \mathrm {ext}_T(q) \subseteq p_m \circ \mathrm {ext}_T(w) \) as a local implication \(\Box _m q \rightarrow \Box _m w\). As the flattening operator is monotonic when the rule \(\Box _m q \rightarrow \Box _m w\) is valid on the set T, we also have \( u \circ p_m \circ \mathrm {ext}_T(q) \subseteq u\circ p_m \circ \mathrm {ext}_T(w)\). We may then define the flattened informativity of \(r=\Box _m q \rightarrow \Box _m w\) as

$$I_F^f(r)= \frac{\mid u\circ \mathrm {ext}_T(q) \mid }{ \mid u \circ \mathrm {ext}_T(q w) \mid } = \frac{\mid \mathrm {ext}_A(q) \mid }{ \mid \mathrm {ext}_A(q w) \mid }$$

Let us consider a (flattened local support set, local closed pattern) pair (el), where e is a community containing a given triangle m, l the corresponding local closed pattern, and c an abstract closed pattern whose support set in G induces a subgraph in which e forms a triangle community. This means that \(\Box _m c \rightarrow \Box _m l\) is a valid local implication rule stating that when we consider the subgraph induced by the support set of c, all the members of the community containing the triangle m also has pattern l (see Fig. 3). The set of such \(\Box _m c \rightarrow \Box _m l\) local implications, with \(c \not = l\), represents (a basis for) the local knowledge deriving from the reduction of the extensional space to triangle communities.

Example 6

Let \(G=(O,E)\) be the graph displayed on the left part of Fig. 2. Each vertex of G belongs to some triangle in G, therefore G is the same as its triangle abstraction. Each vertex has an itemset included in \(\{a,b,c\}\) as a label. The set of triangles is \(T=\{t_0, t_1, t_2, t_3, t_4, t_5, t_6,t_7\}\) and forms a triangle graph \(G_T\) displayed on the right part of Fig. 2. An edge relates any pair of triangles sharing two vertices in G, as for instance \((t_0,t_1)\). Each triangle in \(G_T\) has as its itemset the intersection of the itemsets of its three vertices in G. For instance, the description of \(t_1\) in \(G_T\) is \(ac=abc \cap ac \cap ac\). The vertex subsets inducing connected subgraphs of \(G^T\) form the cc-confluence \(F^T= \{\{t_0\}, \{t_1\},\{ t_0,t_1\},\{t_2\}, \{t_3\},\{ t_2,t_3\}, \{t_4\}, \{t_5\},\{ t_4,t_5\},\{t_6\}, \{t_7\},\{ t_6,t_7\}\}\).

The support set of the pattern a is \(\mathrm {ext}(a)=\{t_0,t_1,t_2,t_3,t_6,t_7\}\). The local support with respect to \(t_0\) is \(p_{t_0}(\{t_0,t_1,t_2,t_3,t_6,t_7\})=\{t_0,t_1\}\), i.e. the connected component containing \(\{t_0\}\) of the subgraph induced by \(\mathrm {ext}(a)\). The local closed patterns, where \(f_i(q)\) denotes a closed pattern which is local w.r.t. triangle \(t_i\), are as follows:

  • \(f_0(a)=f_1(a)= ac\), \(f_2(a)=f_3(a)=ab\), \(f_6(a)=f_7(a)=ab\)

In the same way, the pattern b whose support set is \(\mathrm {ext}(b)=\{t_2,t_3,t_4,t_5,t_6,t_7\}\). leads to the following local closed patterns:

  • \(f_2(b)=f_3(b)= ab\), \(f_4(b)=f_5(b)= bc\), \(f_6(b)=f_7(b)= ab\)

Note that ab appears both as a local closed pattern resulting from a with respect to \(f_0, f_1\) and to \(f_6, f_7\) and as a local closed pattern resulting from b with respect to \(f_2,f_3\) and again to \(f_6, f_7\). This leads to three different sets of local implications:

  • \(\Box _{t_2} a \rightarrow \Box _{t_2} ab\), \(\Box _{t_3} a \rightarrow \Box _{t_3} ab\), \(\Box _{t_6} a \rightarrow \Box _{t_6} ab\), \(\Box _{t_7} a \rightarrow \Box _{t_7} ab\),

  • \(\Box _{t_2} b \rightarrow \Box _{t_2} ab\), \(\Box _{t_3} b \rightarrow \Box _{t_3} ab\), \(\Box _{t_6} b \rightarrow \Box _{t_6} ab\), \(\Box _{t_7} b \rightarrow \Box _{t_7} ab\),

As a whole, a local closed pattern is part of a pair \((e_T,l)\) where l is the local closed pattern and \(e_T\) is a local support set corresponding to one of the connected components induced by the support set. Two examples of such pairs are \((\{t_2,t_3\},ab)\) and \((\{t_6,t_7\},ab)\). When interested in implication rules, we have to consider triples \((c,t_i,l)\) where c is a pattern whose support set is split in different local support sets one of which, namely e, contains \(t_i\). \(\quad \square \)

Example 7

The dataset is denoted as s50-1 and is a standard attributed graph dataset.Footnote 2 It represents 148 friendship relations between 50 pupils of a school in the West of Scotland, and labels concern the substance use (tobacco, cannabis and alcohol) and sporting activity (see [9]). We want to answer to the question:"what knowledge can be extracted when considering groups of pupils connected by friendship relationships?". For that purpose, we computed the local abstract closures associated to the cc-confluence representing 3-clique communities in subgraphs of the triangle graph \(G_T\) derived from the original graph and the "support\(\ge 4\)" constraint on O. In Fig. 3 we represent the flattened local support set e of the local closed pattern l shared in a community (in black lines and dots) of the subgraph induced by the abstract support set of l (in black+grey lines and dots) (w.r.t. the 3-clique abstraction). We also represent (in dashed+ black + grey lines and dots) the abstract support set of the abstract closed pattern c that also induces a subgraph in which e is a connected component. Overall \(\Box _m c \rightarrow \Box _m l\) is a valid local implication rule whose informativity is \(I_F^f(r)= \frac{\mid \mathrm {ext}_A(c) \mid }{ \mid \mathrm {ext}_A(l \mid } =\) \({(5+9+4)}\div {(5+9)} = 1.286\). The specificity \(s_F^f(r)\) of the local closed pattern l is \({5} \div {(5+9)}=0.357\). Here l means “Never has tried Cannabis, drinks moderately, does not smoke“while c means" Has tried Cannabis at most once, drinks moderately, does not smoke”. The specificity of the 3-community, with respect to the whole set of pupils sharing "Have tried Cannabis at most once, drink moderately, does not smoke“ is to be composed only of pupils who have never tried Cannabis.

Fig. 3.
figure 3

Representation of a local rule \(\Box _m c \rightarrow \Box _m l\) extracted from a West of Scotland school friendship network. c is the abstract closed pattern shared by the friendship triangles displayed in black+ grey+ dashed lines and dots. l is a local closed pattern specific to the 3-community, to which the triangle m belongs, represented in black lines and dots. This local closed pattern is shared by all the friendship triangles displayed in black+grey lines and dots.

5 Conclusion

We have discussed here a framework extending the closed itemset mining framework to abstract and local information in an attributed network. Our focus in this article was on the abstract and local knowledge to be extracted as abstract and local rules, together with measures about how specific and informative is abstraction or locality.