A Simple Model of Knowledge Percolation

Bagnoli, Franco; de Bonfioli Cavalcabo’, Guido

doi:10.1007/978-3-031-14926-9_30

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13402))

Included in the following conference series:

International Conference on Cellular Automata for Research and Industry

544 Accesses
1 Citations

Abstract

We investigate how knowledge percolates and clusters in a given knowledge space. We introduce a simple model of knowledge organization in which each contribution spans a certain number of items. If this contribution overlaps with others above a certain threshold, they form a cluster. A contribution can also merge clusters together. We study the growth of global knowledge and the cluster dynamics, both showing a nontrivial behavior.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Hierarchical Topic Modelling for Knowledge Graphs

A New Knowledge Capitalization Framework in the Big Data Context Through Shared Parameters Experiences

Knowledge Networks

Keywords

1 Introduction

Knowledge is the set of ideas, emotions, beliefs and experiences, such as facts (descriptive knowledge), skills (procedural knowledge), or objects (acquaintance knowledge) owned by an individual or shared across collaborating individuals [7, 11]. It can be roughly seen as a set of concepts linked by some relationship (e.g. derivations linked to prerequisites or axioms to form theorems). The set of knowledge items that are connected by a path of links can be denoted as a cluster of knowledge.

A good representation of this description is a network [2] where the single knowledge items are the nodes and the links represent connections among items. A connected cluster is a corpus of knowledge. By adding knowledge three things can happen: the new knowledge item is isolated and forms an isolated cluster, it might join an existing cluster, or it may act as a connection between two clusters, fusing them together.

Since percolation describes the patterns of linked elements under a stochastic or semi-stochastic connection mechanisms [8, 10], the process of filling the vector is analogous to a percolation process, and we can refer to it as the knowledge percolation problem [4, 14].

The reference scenario is that of reconstructing the process that has led to the accumulation of a given corpus of knowledge, and, more important, the underlying cluster dynamics. There are many models that interpret the formation of a collaboration network by the random joining of individuals or contributions, i.e., the formation of a giant component by the establishment of random links. Our model is first of all bipartite, contributions contribute to the knowledge corpus, and the contribution overlaps gives the link among them. Moreover, we require a minimum overlap for establishing the link.

The whole corpus of knowledge can be spanned by several clusters separated by unknown elements of the corpus, or organized in a single cluster where all pieces of knowledge are connected by established relations, the process of acquiring knowledge has many similarity with the formation of a giant components in a random graph [3].

However, in a real case, redundant links are needed for considering concepts as belonging to the same cluster or discipline. So, we assume that a new knowledge item has to have minimum overlap with at least one of the members already belonging to the cluster to be inserted.

Alternatively, this model can be seen also as a collaboration model, in which every agent knows a certain number of concepts, but is able to collaborate with others (i.e.belong to the same group), only if they have a minimum overlap (like speaking the same language and having a shared background [5]), evaluating therefore the possibility of agents to collaborate or to communicate with others, that could be seen as the cooperation of individuals, research groups or societies, to solve a given problem represented by the knowledge vector.

Our model can serve as an interpretation tool for examining, ex-post, how a given corpus assembled and the relative cluster dynamics. For instance, one could study how authors cluster by measuring the overlaps between citations of their papers [9], or compare the evolution of customers of a supermarket by studying the overlap among their buying habits [13]. Our model is however still too rough to be compared with real data.

We study how the number and size of knowledge clusters evolve when adding new items. This can be considered as a k-core growth percolation problem, although normally k-core models are studied by pruning an existing network [6].

2 The Model

We represented the corpus of knowledge K as a numbered set of L items, where $K(n)=1$ if the knowledge item n is present in the corpus and $K(n)=0$ if it is absent. Each contribution $k_i$ is given by a set of N random items $k_i^{(n)}$, $n=1,\dots , N$ among the available ones ($k_i^{(n)}\in \{1, \dots , L\}$),. When $k_i$ is added to the corpus, we set $K(k_i^{(n)})=1$ for $n=1,\dots , N$.

The new contribution is added to a group if it has at least an overlap of $\varOmega $ to one of the elements of the group. A new contribution can also cause the fusion of two separated groups. This process is illustrated in Fig. 1.

Once fixed the values of L, N and $\varOmega $, the algorithm proceeds as follows:

Randomly generate a contribution $k_i$ with N random items $k_i^{(n)}$ among the L available ($1\le k_i^{(n)}\le L$) without repetitions;
Add this contribution to the knowledge corpus $K(k_i^{(n)})=1$ for $n=1,\dots , N$;
Check if there is any overlap with all previous contributions $k_j$ ($\forall j<i$). By denoting this overlap $\omega _{ij}=\sum _{nl} \delta _{k_i^{(n)},k_j^{(l)}}$ (where $\delta $ is the Kronecker delta), we can have three cases:
1. 1.
  $\omega _{ij}\ge \varOmega $ and $k_i$ not belonging to any group: $k_i$ is added to the cluster $C_m$ of $k_j$;
2. 2.
  $\omega _{ij}\ge \varOmega $ and $k_i$ already belonging to a group: merge the cluster $C_m$ of $k_i$ and $C_q$ of $k_j$;
3. 3.
  $\omega _{ij}<\varOmega $: create a new group $C_q$ and assign $k_i$ to it.

There are two different dynamics occurring in our model: how the knowledge corpus size grows, and how clusters are formed or merged together. The first one is a representation of how knowledge grows in a single individual or in a group/society, while the second one could be seen as the representation of how different branches of knowledge are intertwined [12].

2.1 Knowledge Corpus Dynamics

Let us denote the corpus size by $S=\sum _n K(n)$. In the case $L\gg N$, at the beginning contributions do not overlap because the probability to have overlapping items is too low. Therefore the corpus size S grows linearly with time.

The population dynamics of the corpus is an independent stochastic process, with the probability of adding an original contribution decreasing in time (t), while the number of those already present in the corpus (x) increases.

Let us examine the case with $N=1$ for simplicity. The probability (P(S, t)) of having S items already present at time t is

$$\begin{aligned} P(S,t+1)= \frac{L-(S-1)}{L} P(S-1,t) +\frac{S}{L}P(S,t). \end{aligned}$$

(1)

In the limit of continuous time and space, we have

$$\begin{aligned} \frac{\partial P}{\partial t}= - \frac{L-S}{L} \frac{\partial P}{\partial S} \end{aligned}$$

(2)

And using the method of characteristics [1] we obtain that: $P= f\left( (L-S) \exp (\frac{t}{L})\right) $ and therefore the average knowledge ($\overline{S}$) grows as $\overline{S} = L (1-C\exp (-\frac{t}{L})$, where C is an integration constant, fixed by the initial condition $\overline{S}(0)=0$, so that $C=1$,

$$\begin{aligned} \overline{S} = L (1-e^{-\frac{t}{L}}) \end{aligned}$$

(3)

Since the addition of knowledge is an independent process we can write Eq. (3) with arbitrary N

$$\begin{aligned} \overline{S} =L(1-e^{-\frac{Nt}{L}}). \end{aligned}$$

(4)

Confronting Eq. (4) with the results of simulations we get an almost complete overlap as shown in Fig. 2, even though in simulations $\varOmega > 1$.

Indeed, it seems that the knowledge size S does not depend much on the value of $\varOmega $, as shown in Fig. 3.

The knowledge size S grows faster for larger values of N, as shown in Fig. 4 and consistently with Eq. (4).

2.2 Cluster Dynamics

The number of clusters A at first grows with time, reaches a maximum and then decreases, ending with a single cluster, as reported in Fig. 5.

We can distinguish three phases:

An initial almost linear growth of A, where new contributions mostly form a new cluster;
An intermediate phase with, in which new contributions are mostly added to an existing cluster;
A decreasing behavior dominated by cluster merging.

We measured the formation of a new cluster (number of new clusters $a_n(t)$), the addition to an existing cluster (number of additions $a_a(t)$) and the merging of two clusters (number of merging $a_m(t)$), as shown in Fig. 6.

The actions of forming new clusters or adding it to an existing one (whether or not it causes a merging) are mutually exclusive, therefore $a_n+a_a=1$ and the total number of clusters A(t) is given by

$$\begin{aligned} A(t) = \sum _{\tau =1}^{t} a_n (\tau ) - a_m (\tau ) \end{aligned}$$

(5)

We can develop a simple approximation for $a_n$ in the case $N=1$, $\varOmega =1$, for which we have either the formation of a new cluster (of size one) or the addition of another cluster, and no cluster merging.

By denoting with P(A, t) the probability of having A clusters at time t, we have

$$ P(A, t+1) = \frac{L-A+1}{L} P(A-1,t) +\frac{A}{L}P(A,t) $$

which can be approximated by

$$ \frac{\partial P}{\partial t} = \frac{A-L}{L}\frac{\partial P}{\partial A} $$

and therefore

$$ P=f\left( (A-L)\exp \left( \frac{t}{L}\right) \right) $$

so that, for large A and small t, we have

$$ a_n(t) = \frac{d\overline{A}}{dt}\propto \exp \left( -\frac{t}{L}\right) $$

which, for $N>1$ corresponds to

$$ a_n(t) \propto \exp \left( -\frac{g(N)t}{L}\right) $$

and numerically, as shown in Fig. 7, we have roughly

$$ g(N)=3N-4. $$

The time distribution of the numbers of clusters A(t) depends on N and $\varOmega $. In particular with a smaller number of items in each contribution (smaller value of N) is less probable to get items have an $\varOmega $ overlap and therefore a larger number of clusters A will form before merging, as shown in Fig. 8.

On the contrary, having a smaller number of elements needed to match for merging (smaller values of $\varOmega $), means that the dynamics will reach the final state much faster, and the maximum number of clusters reached will be much smaller, as shown in Fig. 9.

It is therefore expected that A(t) scales with N and $\varOmega $. Numerically, we found that all numerical curves overlap, as shown in Fig. 10, by rescaling A/Q and t/Q, with

$$ Q= \left( \frac{L}{F}\right) ^\varOmega $$

and

$$ F = aX^2 +bX +c $$

with

$$ X=\frac{4K}{4+\varOmega } $$

as shown in Fig. 11. We have not found a valid approximation for this behavior.

3 Conclusions

We investigate a problem related to knowledge percolation, clustering and formation of a giant component, somewhat related to k-core percolation problems.

We studied a simple model in which each contribution is constituted by a certain number of items, joining a cluster or even fusing two of them when the overlap exceeds a given threshold. We showed that the growth of global knowledge and the cluster dynamics has a nontrivial time behavior, providing some analytical approximations.

References

Abbott, M.B.: An Introduction to the Method of Characteristics. American Elsevier (1966)
Google Scholar
Barabási, A.L.: Network science: luck or reason. Nature 489(7417), 507–508 (2012). https://doi.org/10.1038/nature11486
Ding, J., Kim, J.H., Lubetzky, E., Peres, Y.: Anatomy of a young giant component in the random graph. Rand. Struct. Algorithm. 39(2), 139–178 (2010). https://doi.org/10.1002/rsa.20342
Essam, J.W.: Percolation theory. Rep. Progr. Phys.43(7), 833–912 (1980). https://doi.org/10.1088/0034-4885/43/7/001
Fleming, L., Frenken, K.: The evolution of inventor networks in the silicon valley and boston regions. Adv. Complex Syst. 10(1), 53–71 (2007). https://doi.org/10.1142/s0219525907000921
Kong, Y.X., Shi, G.Y., Wu, R.J., Zhang, Y.C.: k-core: theories and applications. Phys. Rep. 832, 1–32 (2019). https://doi.org/10.1016/j.physrep.2019.10.004
Kumbhar, R.: Knowledge organisation and knowledge organisation systems. In: Library Classification Trends in the 21st Century, pp. 1–6. Elsevier (2012). https://doi.org/10.1016/b978-1-84334-660-9.50001-1
Lee, D., Kahng, B., Cho, Y.S., Goh, K.I., Lee, D.S.: Recent advances of percolation theory in complex networks. J. Korean Phys. Soc. 73(2), 152–164 (2018). https://doi.org/10.3938/jkps.73.152
Leicht, E.A., Clarkson, G., Shedden, K., Newman, M.E.: Large-scale structure of time evolving citation networks. Eur. Phys. J. B 59(1), 75–83 (2007). https://doi.org/10.1140/epjb/e2007-00271-7
Li, M., Liu, R.R., Lü, L., Hu, M.B., Xu, S., Zhang, Y.C.: Percolation on complex networks: Theory and application. Phys. Rep. 907, 1–68 (2021). https://doi.org/10.1016/j.physrep.2020.12.003
Renn, J.: The Evolution of Knowledge: Rethinking Science for the Anthropocene. Princeton University Press, Princeton (2020). https://press.princeton.edu/books/hardcover/9780691171982/the-evolution-of-knowledge
Shen, Z., et al.: Interrelations among scientific fields and their relative influences revealed by an input–output analysis. J. Informet. 10(1), 82–97 (2016). https://doi.org/10.1016/j.joi.2015.11.002
Stassen, R.E., Mittelstaedt, J.D., Mittelstaedt, R.A.: Assortment overlap: its effect on shopping patterns in a retail market when the distributions of prices and goods are known. J. Retail. 75(3), 371–386 (1999). https://doi.org/10.1016/s0022-4359(99)00013-5
Stauffer, D., Aharony, A.: Introduction to Percolation Theory. Taylor & Francis (1985). https://doi.org/10.4324/9780203211595

Download references

Author information

Authors and Affiliations

Department of Physics and Astronomy and CSDC, University of Florence, via G. Sansone 1, 50019, Sesto Fiorentino, Italy
Franco Bagnoli & Guido de Bonfioli Cavalcabo’
INFN, sez. Firenze, Firenze, Italy
Franco Bagnoli

Authors

Franco Bagnoli
View author publications
You can also search for this author in PubMed Google Scholar
Guido de Bonfioli Cavalcabo’
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Franco Bagnoli .

Editor information

Editors and Affiliations

University of Geneva, Geneva, Switzerland
Bastien Chopard
University of Milano-Bicocca, Milan, Italy
Stefania Bandini
University of Milano-Bicocca, Milan, Italy
Alberto Dennunzio
University of Geneva, Geneva, Switzerland
Mira Arabi Haddad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bagnoli, F., de Bonfioli Cavalcabo’, G. (2022). A Simple Model of Knowledge Percolation. In: Chopard, B., Bandini, S., Dennunzio, A., Arabi Haddad, M. (eds) Cellular Automata. ACRI 2022. Lecture Notes in Computer Science, vol 13402. Springer, Cham. https://doi.org/10.1007/978-3-031-14926-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-14926-9_30
Published: 13 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14925-2
Online ISBN: 978-3-031-14926-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Simple Model of Knowledge Percolation

Abstract

Similar content being viewed by others

Hierarchical Topic Modelling for Knowledge Graphs