Keywords

1 Introduction

Since the widespread use of network analysis in mid 90’s [1], social sciences have mostly focused on the comprehension of the structure of durable relationships (e.g. friendship) and its evolution. However, in some contexts the concept of connections is required to represent something that is more similar to a series of flickering and dynamic interactions, than to stable relationshipsFootnote 1. When dynamic interactions are observed, the presence and the evolution of meso-structuresFootnote 2 can be scarcely investigated through the use of methods that are suitable for a process of stepwise creation/dissolution of connections [2]. The continuous activation/inactivation of links between agents demands the use of methodologies that, instead of considering the statistical significance of the formation/modification of the relational architectures, focus on the physical order contained in the occurred phenomena [3]. One example of a methodology that allows this is the Relevance Index algorithm, henceforth RI [4,5,6]. The RI, in order to investigate emergent temporal patterns in dynamic complex systems, uses a statistical approach to evaluate the significance of the integration in terms of entropy of agents’ joint behaviors.

Although in complex networks analyses the detection of groups of agents is typically performed by focusing on agents’ similarity or through the analysis of the network structure [7, 8], the creation and the implementation of the RI algorithm provides a new approach for community detection analysis. With the RI algorithm researchers can detect groups of agents characterized by high levels of behavioral integration. These behaviors, being significantly far from randomness, are expected to reveal the presence of a common function jointly pursued by all the involved members. Since low levels of entropy are determined by the repetition of specific combinations of joint individual statuses over time, the emergence of a non-random temporal pattern unveils the alignment of the actions of these individuals towards a common function. Nevertheless, to implement the RI algorithm in dynamic complex networks that have at least some thousands of agents, and that are observed in a number of instants that is sensibly lower than the number of agents involved, additional methodological steps have to be developed. In particular, in the present work the three-step ‘PoSH-CADDy’ methodology is developed so as to provide a possible solution to refine the presence of redundancy in the results provided by the RI algorithm when implemented on temporal networks having the aforementioned characteristics.

In Sect. 2 in which an overview of the proposed methodology is presented. In Sect. 3 the principles of the RI algorithm are introduced. Then, in Sect. 4, the first step of the methodology is described, regarding the run of the RI algorithm several times over sets of agents progressively reducing. Then, in Sect. 4 the second step of the methodology is described, regarding the implementation of a hierarchical agglomerative cluster analysis over masks, i.e. subsets of agents belonging to the analyzed system, detected at the previous step. Section 5 follows with the third and last step of the methodology, regarding a final treatment for redundancy of masks detected in all round. Because of this last step, a final set of masks, i.e. a partition of the system, is detected. Finally, in Sect. 6 the implementation of the methodology in a case study is presented. After selecting combinations of the introduced parameters, explorative considerations are made on partitions selected according to (i) a principle of maximization of the overall percentage of agents involved in the partitionFootnote 3 and (ii) a principle of minimization of the percentage of agents that belong to more than one mask.

2 Overview of the Methodology

Acknowledging other ongoing researches with similar objectives [9, 10], the present work addresses the issue of redundancy that arises by implementing RI methodology in systems with a small ratio between the number of instants in time over which agents can be observed, and the number of agents involved. More specifically, the methodology aims to identify a limited number of masks of agents, i.e. subsets of agents detected by the RI algorithm, so as to allow a final simple representation of the functional meso-structures that are present in the considered complex network. The proposed methodology is based on the following three parameters:

  1. (1)

    R, i.e. the number of RI rounds of analysis that are performed,

  2. (2)

    \({v_{OV}}\), i.e. a threshold used as reference to limit the presence of overlapping agents among the subset of masks finally considered from each round of RI analysis,

  3. (3)

    \(v_{SM}\), i.e. a second threshold used as reference to reduce redundancy among the masks remaining after all the previous steps.

Each parameter has a strong connection with one of the steps of the methodology: R parameter defines the length of a process of Progressive Skimming, based on the reiteration of the RI algorithm in rounds of analysis in which the best mask obtained in the previous round is dropped; \({v_{OV}}\) parameter defines the development of a Hierarchical Cluster Analysis of the masks detected in each round; \(v_{SM}\) parameter defines the final process of refinements in which the remaining masks after all rounds are analyzed in terms of their Degree of Dissimilarity. The methodology is named ‘PoSH-CADDy’ and is summarized in Pseudo-Code 1. The refinement of the output of the RI analysis is performed attempting (i) a spread and wide exploration of the meso-structures of the system under analysis through progressive skimming, and moving towards the detection of masks that (ii) are the most significant (in terms of integration of the behaviors of the agents belonging to them), and that (iii) produce a limited degree of overlaps among them, so as to favor simplicity in the analysis of complex network’s dynamics. The ‘PoSH-CADDy’ procedure (independently from the RI algorithm) is implemented with the R language with a CPU Intel Core i5 2.6 GHz processor and 8 GB RAM. The computational time (with R = 24 and where \({v_{OV}}, {v_{SM}}\) is tested with 21 different values each) is approximatively of 5 h. This time period is essentially required for the computation of the distance matrices that are needed to implement the cluster analysis of each group of 15.000 masks that are detected by the RI algorithm in each round. The other steps require a computational time of some minutes. The work does not take into consideration the computational performance of the RI algorithm, as what developed applies to a procedure of refinements of its results.

figure a

3 Principles of the RI Algorithm

The Relevance Index algorithm takes its origin from the neurological studies of Giulio Tononi in the 90’s. Tononi introduced the notion of functional cluster, defining it as a set of elements that are much more strongly interactive among themselves than with the rest of the system, whether or not the underlying anatomical connectivity is continuous [11]. The hypothesis was confirmed as neurons with similar functions are found to demonstrate high level of coordination in their behaviors over time, independently from being (or not) situated in the same brain region [12, 13]. The Cluster Index (henceforth, CI), i.e. the statistics developed and tested by Tononi in his work [12], is based on two information theory concepts derived from the Shannon entropy: Integration (I) and Mutual Information (MI). Formally, given the set \(A=\{a_{1},a_{2},\dots ,a_{n},\dots ,a_{N}\}\) made of N agent and a mask of agents \(B^{m}\) such that \(B^{m} \subset A\), the CI of \(B^{m}\) is written as follows

$$\begin{aligned} CI(B^{m})=I(B^{m})/MI(B^{m},A\setminus B^{m}) \end{aligned}$$
(1)

where \(2 \le |B^{m}| < |A|\) and \(0<m\le \xi \), with \(m \in \mathbb {N^{+}}\) and \(\xi \approx 2^{|A|}\).

Since integration and mutual information values depend on the size of the subsystem that is under analysis, a homogeneous system made of variables having the same probabilities of the variables of the original system, but that do not have correlationFootnote 4 is used [4, 5, 12]. Finally, the level of significance of the normalized CI, namely \(t_{CI}\), is the value according to which the final ranking of the subsets is produced:

$$\begin{aligned} CI'(B^{m})={\frac{I(B^{m})}{\langle {I_{h}\rangle }}}\Big /{\frac{MI(B^{m},A\setminus B^{m})}{\langle {MI_{h}}\rangle }} \end{aligned}$$
(2)
$$\begin{aligned} t_{CI}={\frac{CI'(B^{m})-\langle {CI'_{h}}\rangle }{\sigma (CI'_{h})}} \end{aligned}$$
(3)

where \(\langle I_{h}\rangle \) and \(\langle M_{h}\rangle \) indicate respectively the average integration of subsets of dimension \(|B^m|\) belonging to the homogeneous system, and the average mutual information between these subsets and the remaining part of the homogeneous system. \(\langle CI'_{h}\rangle \) and \(\sigma (CI'_{h})\), respectively the mean and the standard deviation of normalized cluster indices of subsets that have the same size of \(B^{m}\) and that belong to the homogeneous system, are used to compute the statistical index \(t_{CI}\).

The concept of CI and \(t_{CI}\) was introduced in the research areas of artificial network models, of catalytic reaction networks and of biological gene regulatory systems, contributing to the identification of emergent meso-level structures [4]. Since an exhaustive computation of the \(t_{CI}\) statistic is possible only in small artificially designed networks, as those that were initially used to test the efficacy of the method [4,5,6], a genetic algorithm aimed to investigate the relevant subsets was implemented [6] in the RI algorithm. When implemented in large systems that can be observed in a relatively small number of instants in time, the RI algorithm produces a large number of possible \(B^m\), which may differ among them just for the presence/absence of a single agent. As many similar masks are detected, redundancy emerges.

4 Step 1: Progressive Skimming of the Best Mask Detected

In order to address the large presence of similar masks detected in the considered system \(A=\{a_{1},a_{2},\dots ,a_{n},\dots ,a_{N}\}\) made of N agent, the first step that is proposed is the run of several rounds of the RI algorithm. Each round \(r \in \mathbb {N^{+}}\), with \(r \le R\), and where \(R\in \mathbb {N^{+}}\) indicates the number of rounds finally performed, is set to produce the detection of a same number of masksFootnote 5. At the same time, in each r round a different set of agents is considered, namely \(A_r=\{a_{1},a_{2},\dots ,a_{n_r},\dots ,a_{N_r}\}\) where \(A_r \subset A\) and with \(|A_r|=N_r\). In order to formally describe the output of any round r of the analysis, the sets of masks detected by the RI algorithm, namely \(\mathbb {O}(A_r)\), is defined according to the corresponding round of analysis. Formally,

$$\begin{aligned} \mathbb {O}(A_r)= \{B^{1}_r, B^{2}_r, \dots , B^{m}_r, \dots , B^{M}_r \} \end{aligned}$$
(4)

where

  1. i.

    \(B^{m}_r=\{ a_{n_r} \in A_r : b_{m,n_r}=1\}\) is the m-th mask detected

  2. ii.

    \(b_{m,n_r}={\left\{ \begin{array}{ll} 1, &{} \text {if the agent}\, a_{n_r}\text { is detected in the }m\text {-th mask}\\ 0, &{} \text {otherwise}. \end{array}\right. }\)

  3. iii.

    \(t_{CI}(B^m_r) \ \ge \ t_{CI}(B_r^{m+1})\)

For the definition of each different set of agent \(A_{r}\), a cascade process is used. Before each round, the agents belonging to best mask detected in the previous round, i.e. \(B^{1}_{r-1}\), are dropped from the analysis, such that the cardinality of the set of agents considered, i.e. \(A_r\), decreases after each round. Formally, each \(A_{r}\) can be described as

$$\begin{aligned} A_{r}=A\setminus \bigcup \limits _{q=0}^{r-1}B_{q}^{1} \end{aligned}$$
(5)

where \(q \in \mathbb {N}\) indicates one of the rounds preceding the r-th roundFootnote 6, and where \(0\le q \le (R-1)\). Therefore, \(A_{r} \subset A_{r-1}\ \forall \ r\). As in each round, the set that is analyzed with the RI algorithm does not include any of the best masks detected in the previous rounds, this procedure is called ‘progressive skimming’.

The RI algorithm produces a list of ordered binary masks that may differ among them just for the presence/absence of one single agent. Therefore, for the best detected mask of agents, i.e. \(B^{1}_{r}\), also many similar masks are detected (as they are likely to perform well also from a point of view of entropy) in \(\mathbb {O}(A_r)\). Because of this redundancyFootnote 7 the progressive skimming of masks is implemented, so as to perform an extended exploration of the system. This procedure, even if it deals with a loss of information and a reduction (and so also change) of the considered system when continuing the analysis round after round, allows the researcher to analyze how the rest of the system works independently from what in the previous rounds has been detected the group of agents with the most integrated behaviors. Interactions between the best mask detected in round r and masks detected in following rounds are limited, since the agents belonging to \(B^1_{r}\) are removed from the sets of agents that is going to be analyzed in the rounds following the r-th. However, because of the implementation of a hierarchical agglomerative cluster analysis (Sect. 5), in each round r also all the other masks different from \(B^1_{r}\) are taken into account. Therefore, the progressive skimming does not imply that the best mask \(B^{1}_{r}\) stands in a condition of isolation. If mask \(B^{1}_{r}\) has significant intersections/interactions with other masks \(B^{m}_{r}\) detected in the same round, evidences should appear in the cluster analysis of the whole \(\mathbb {O}(A_r)\). In contrary, if masks substantially different from \(B^{1}_{r}\) do not emerge from the cluster analysis, some clues of a functional detachment between the agents in \(B^{1}_{r}\) and the agents that belong to the rest of the system are detected.

5 Step 2: Clusters of Masks Within Each r-th Round

5.1 The Cluster Analysis of Masks in \(\mathbb {O}(A_r)\)

With the Simple Matching Coefficient (SMC) distance is measured between couples of masks, and with the Complete Linkage (CL) criterion for the progressive merging of clusters, a hierarchical agglomerative cluster analysis is then implemented. This analysis is here represented by the function \({\uptheta _{\upkappa }}\) assigning each mask \(B_r^m\) to one (and only one) cluster. Formally,

$$\begin{aligned} \mathop {\uptheta }\nolimits _{\upkappa }^{SMC,CL}(B_{r}^{m})=k \end{aligned}$$
(6)

where \({k\le \upkappa }\), with \(k \in \mathbb {N^{+}}\) indicating the specific cluster to which each mask \(B_{r}^{m} \in \mathbb {O}(A_r)\) is assigned through the hierarchical cluster analysis (with SMC and CL) in which the masks of \(\mathbb {O}(A_r)\) are allocated in a number of clusters equal to \({\upkappa \in \mathbb {N^{+}}}\). Since the number of clusters is not established a-priori, at this stage the definition of each cluster, namely \({\mathbb {C}_{k,\upkappa }(A_r)}\), has to take into account the fact that \({\upkappa }\) can vary. Therefore, each cluster \({\mathbb {C}_{k,\upkappa }(A_r)}\) is formally defined as

$$\begin{aligned} {\mathbb {C}_{k,\upkappa }(A_r)=\{B_{r}^{m} \in \mathbb {O}(A_r) : \uptheta _{\upkappa }(B_{r}^{m}) =k \}} \end{aligned}$$
(7)

where \({\mathbb {C}_{k,\upkappa }}\) is the k-th cluster, obtained by dividing in \({\upkappa }\) clusters the masks contained in \(\mathbb {O}(A_r)\).

5.2 The Selection of a Representative Masks for Each Cluster

For any cluster obtained, only the mask with the highest \(t_{CI}\) is considered, as representative of the cluster itself. Formally, this mask, henceforth indicated as \({\tilde{B}_{r,k,\upkappa }}\), has the following properties:

$$\begin{aligned} \tilde{B}_{r,k,\upkappa } \in \mathbb {C}_{k,\upkappa }(A_r) \quad \text {and} \quad t_{CI}(\tilde{B}_{r,k,\upkappa })=max \ t_{CI}(\mathbb {C}_{k,\upkappa }(A_r)). \end{aligned}$$
(8)

Therefore, each cluster is represented by the mask that, belonging to it, is also the one whose agents present a joint behavior that is the significantly farthest from randomness. By adopting this criterion, the principles underpinning the RI algorithm are respected. Even if several different combination may be present, the analysis of the similarity reveals groups of masks that have to be intended just as possible modification of the one of reference, i.e. the most relevant one.

5.3 Overlaps and the \(s_{OV}\) Statistic

The cluster analysis of \(\mathbb {O}_r (A_r)\) and the selection of the mask with the highest \(t_{CI}\) for each cluster, can produce the affiliation of agents to more than one masksFootnote 8. In order to set the value of \({\upkappa }\), i.e. to determine the number of clusters, a criterion concerning the limitation of the progressive emergence of overlaps in the observed structure of masks is adopted. In order to understand which degree of overlap is associated with the values of \({\upkappa }\), starting from 1 and continuing in increasing order, the statistic \({s_{OV}(r,\upkappa )}\), where the subscript ‘OV’ stands for OVerlaps, is computed as

$$\begin{aligned} s_{OV}(r,\upkappa )=\frac{|\bigcup \limits ^{\upkappa }_{k_\alpha ,k_\beta =1}(\tilde{B}_{r,k_\alpha ,\upkappa } \cap \tilde{B}_{r,k_\beta ,\upkappa })|}{| \bigcup \limits _{k=1}^{\upkappa } \tilde{B}_{r,k_,\upkappa }|} \quad \quad \forall \ k_\alpha \ne k_\beta \end{aligned}$$
(9)

where \({k_\alpha ,k_\beta \in \{1,\dots , k, \dots , \upkappa \}}\) are the indices of two distinct clusters \({\mathbb {C}_{k,\upkappa }(A_r)}\), obtained by implementing the function \({\uptheta _{\upkappa }}\) over the set of masks \(\mathbb {O}(A_r)\). The statistic \({s_{OV}(r,\upkappa )}\) calculates, for each possible value of r and of \({\upkappa }\), the ratio between the number of agents that belong to at least two masks (numerator) and the number of agents that belong to at least one mask (denominator). The introduced statistic aims to evaluate the degree of simplicity associated to each possible number value of \({\upkappa }\), i.e. the number of clusters in which to group the masks included in \(\mathbb {O}(A_r)\). The simplicity lies on the fact that masks have to be recognizable and distinct from each other. If the structure of the detected masks is characterized by a high degree of overlap, the masks are so intertwined that they cannot be assumed as unitarity entities and the representation of the whole system, that they are suppose to provide, is finally unreadable.

5.4 Selection of the Number of Clusters by Means of \(v_{OV}\) Parameter

In order to define the value of \({\upkappa }\), i.e. the number of clusters in which to split each set of masks \(\mathbb {O}_r (A_r)\), the criterion adopted lies in the comparison between the statistic \({s_{OV}(r,\upkappa )}\), defined by Eq. 9, and a percentage threshold used as reference, namely \(v_{OV} \in \mathbb {R}_{\ge 0}\), with \(0\le v_{OV}\le 1\). Given a specific value of \(v_{OV}\), the value \({\upkappa }\) is chosen in order to have the highest number of clusters among those to which corresponds a \({s_{OV}(r,\upkappa )}\) lower than, or equal to, the percentage threshold \(v_{OV}\). For each r-th round, a set of possible value of \({\upkappa }\) is so selected. These sets, namely \(\mathcal {K}_{r,v_{OV}}\), are formally described as follows.

$$\begin{aligned} \mathcal {K}_{r,v_{OV}}=\{\upkappa \in \mathbb {N^{+}} : \ s_{OV}(r,\upkappa )\le v_{OV}\} \end{aligned}$$
(10)

For each round r, depending on the threshold \(v_{OV}\), all the values of \({\upkappa }\) that produce a partition for which the percentage of agents that belong to more than one group (up to the number of agents overall included) is less or equal to the considered threshold \(v_{OV}\), are considered admissible. Then, among all the elements contained in \(\mathcal {K}_{r,v_{OV}}\), the value \({\tilde{\upkappa }_{r,v_{OV}}}\), i.e. the final value in which finally to split the resulting masks contained in \(\mathbb {O}_r (A_r)\) given the specific threshold \(v_{OV}\), is defined as

$$\begin{aligned} \tilde{\upkappa }_{r,v_{OV}} = \max \ \mathcal {K}_{r,v_{OV}} \end{aligned}$$
(11)

By identifying \({\tilde{\upkappa }_{r,v_{OV}}}\), the highest number of cluster, given the threshold \(v_{OV}\), is selected. Therefore, the soft partitionFootnote 9 obtained in any of the r rounds, namely \(\mathbb {P}_{r,v_{OV}}\), and can be formally defined as

$$\begin{aligned} \mathbb {P}_{r,v_{OV}}=\{\tilde{B}_{r,k,\upkappa } \in \mathbb {O}(A_r)\ : \ \upkappa =\tilde{\upkappa }_{r,v_{OV}} \} \end{aligned}$$
(12)

6 Step 3: Final Treatment of Redundancies

6.1 The Set of Masks Resulting from All the Rounds: \(\mathscr {P}_{R,v_{OV}}\)

At the end of an entire process of analysis alwaysFootnote 10 with the same value of the parameter \(v_{OV}, R\) sets of masks are obtained, and each of them is identified by the corresponding \(\mathbb {P}_{r,v_{OV}}\). Therefore, since the analysis is developed with a specific value of R and a specific value of \(v_{OV}\), it is possible to assemble all the masks in a unique set, namely \(\mathscr {P}_{R,v_{OV}}\), that can be formally defined as

$$\begin{aligned} \mathscr {P}_{R,v_{OV}}=\{ \tilde{B}_{r,k,\upkappa } \in \bigcup _{r=1}^{R}\mathbb {O}(A_r)\ :\ \upkappa =\tilde{\upkappa }_{r,v_{OV}} \} \end{aligned}$$
(13)

where \({\tilde{\upkappa }_{r,v_{OV}}}\) is the number of clusters in which the specific \(\mathbb {O}(A_r)\) is divided, as a result of the process described in Eqs. (811), and where, as explained in Eq. (8), the tilde over the mask \({B_{r,k,\upkappa }}\) indicates that in the cluster of masks to which it belongs, i.e. \({\mathbb {C}_{k,\upkappa }(A_r)}\), the mask \({\tilde{B}_{r,k,\upkappa }}\) presents the highest \(t_{CI}\). Once the set \(\mathscr {P}_{R,v_{OV}}\) is defined, the last issue addresses the consequence of having implemented a reiterated procedure of analysis, i.e. multiple rounds of the RI algorithm. As at the beginning of each round r exclusively the agents belonging to \(B^1_{r-1}\) are dropped, the presence of similar masks (among all those detected in an entire process of analysis) is not preventedFootnote 11. The following, and last, steps aim to manage this redundancy.

6.2 Sorting the Masks of \(\mathscr {P}_{R,v_{OV}}\) in Decreasing Order of \(t_{CI}\)

All the masks belonging to \(\mathscr {P}_{R,v_{OV}}\) are sorted in decreasing order, according to the value of their \(t_{CI}\). In this way, from the set of masks \(\mathscr {P}_{R,v_{OV}}\), the sorted set of masks \(\mathscr {P}_{R,v_{OV}}^{+}\) is generated. Formally,

$$\begin{aligned} \mathscr {P}_{R,v_{OV}}^{+}=\{ \tilde{B}_{R,v_{OV}}^{(1)},\tilde{B}_{R,v_{OV}}^{(2)}, \dots , \tilde{B}_{R,v_{OV}}^{(j)}, \dots , \tilde{B}_{R,v_{OV}}^{(J)} \} \end{aligned}$$
(14)

where \(|\mathscr {P}_{R,v_{OV}}|=J\), and \(\tilde{B}_{R,v_{OV}}^{(j)}\) is one of the masks belonging to \(\mathscr {P}_{R,v_{OV}}\) and that were previously indicated as \({\tilde{B}_{r,k,\tilde{\upkappa }_{r,v_{OV}}}}\). Moreover, the index in the superscript, i.e. \(j \in \mathbb {N^{+}}\) where \(j\le J\), refers to the ordinality of the masks of \(\mathscr {P}_{R,v_{OV}}^{+}\), so that is true the condition \(t_{CI} ( \tilde{B}_{R,v_{OV}}^{(j)}) \ > \ t_{CI}(\tilde{B}_{R,v_{OV}}^{(j+1)} )\).

6.3 Final Drop of Similar Masks According to the Paramater \(v_{SM}\)

Once the masks are ordered according to their \(t_{CI}\), a final analysis of their similarity is performed. Starting from the best mask \(\tilde{B}_{R,v_{OV}}^{(1)}\), all the masks that are too similar to it are dropped. Then, the same procedure is repeated in cascade process. The second best mask of those remaining is then compared with those having a lower \(t_{CI}\), and so forth with the third best mask remaining, the fourth, etc. This procedure continues until there are no more masks that can be used as a reference. In this way, only masks that have a minimum degree of dissimilarity are kept.

The similarities between masks are calculated in terms of JaCcard IndexFootnote 12 (henceforth, JC), i.e. the percentage of the number of agents in the intersection of the two considered masks (up to the number of agents in the union set of the same two masks). Then, the set of masks \(\mathscr {P}^{+}_{R,v_{OV}}\) is filtered using a threshold regarding SiMilarity, namely \(v_{SM} \in \mathbb {R}_{\ge 0}\), with \(0\le v_{SM}\le 1\). The resulting (and final) set of masks, namely \(\mathscr {F}_{R,v_{OV},v_{SM}}\), can be formally defined as

$$\begin{aligned} \begin{aligned} \mathscr {F}_{R,v_{OV},v_{SM}}= \{ \tilde{B}_{R,v_{OV}}^{(j)} \in \mathscr {P}_{R,v_{OV}}^{+} \ :&\ JC(\tilde{B}_{R,v_{OV}}^{(i)}, \tilde{B}_{R,v_{OV}}^{(j)}) \ < \ v_{SM} \} \end{aligned} \end{aligned}$$
(15)

where

  1. i.

    \(\tilde{B}_{R,v_{OV}}^{(i)} \in \ \mathscr {P}_{R,v_{OV}}^{+}\),

  2. ii.

    i and j, where \(i \in \mathbb {N}\) and \(j \in \mathbb {N}^+\) and \(0\le i < j\), are used to indicate the mask of \(\mathscr {P}_{R,v_{OV}}^{+}\) by making reference to their ordinality, as described in Eq. 14,

  3. iii.

    \(\tilde{B}_{R,v_{OV}}^{(0)} = \varnothing \ \), so that \(JC(\tilde{B}_{R,v_{OV}}^{(0)}, \tilde{B}_{R,v_{OV}}^{(1)})=0\).

7 Case Study - Region Tuscany Innovation Policies

In this Section, an example of an implementation of the RI+PoSH-CADDy methodology in an empirical analysis, is presented. The considered case study addresses a regional programme implemented by Tuscany Region (Italy) in the period 2000–2006, aiming to support innovation projects. The considered network policy programme sustained the development of innovation processes by fostering interactions between local agents (enterprises, universities, public research centers, local government institutions, service centers, etc.) [14,15,16]. Starting in 2002 (and ending in 2008), the programme of public policies was consisted of nine waves not uniformly distributed over time: they had different durations and they overlapped, producing periods in which no wave was active, and periods in which three waves were simultaneously active. The degree of formation and of dissolution of connections was so high that resulted in a situation of intense discontinuity over time. Therefore, a new appropriate tool that does not investigate the flourishing of communities looking at the stepwise creation of network frameworks, was deemed necessary [2]. Moreover, by using the RI algorithm the analysis could take into account the presence of functional meso-structures. Finally, because of the objective of policies taken into consideration, i.e. fostering of innovative processes, the focus on interactive dynamics, more than on network’s relational architectures, is even more meaningfulFootnote 13 [17].

7.1 Available Data and Pre-processing

The most important aspect regarding the implementation of RI analysis in the present case study regards the definition of the informational basis describing agents’ statuses of activity. Since the available data contains information on the starting and the ending dates of agents’ participations in the projects, it is possible to define a set of 59 instants in timeFootnote 14 to observe the system. With these dates, a complete behavioral profile for each of the agents involved in the policy programme is structured. In each instant, the number of projects in which each agent was active is considered. A series of 58 variables is generated taking into account how the levels of activity vary from one instant to the following oneFootnote 15. Regarding the size of the system, the agents participating in the described policies of Region Tuscany are 1121, and the majority of them participated just in one project. The scarcely active agents are removed from the analysis. The focus is set on those with a minimum degree of activity. Therefore, only agents that at least participated in 2 projects are considered. Finally, 352 agents remain. These agents constitute the initial set of the analysis, namely A.

7.2 Setting the Parameters

The RI analysis with the PoSH-CADDy methodology is implemented over the set A, consisted of 352 agents, observed in 58 instants over time. The number of rounds to be performed, i.e. the parameter R, is set equal to 24. With the progressive skimming, as described in Sect. 4, \(A_{24}\) is consisted of 204 agents. Therefore, A is extensively explored, as the procedure is stopped after having removed the \(45.17\%\) of the agents initially involved. Regarding the threshold \(v_{OV}\), since no specific theoretical reasons suggest the a-priori identification of a specific value, a discrete set \(V_{OV}\) of percentage thresholds is used and each \(v_{OV} \in V_{OV}\) is considered to implement a process of analysis. The set \(V_{OV}\) is defined as follows:

$$\begin{aligned} V_{OV}=\{v_{OV} \in \mathbb {R}_{>0} : \ v_{OV} =\frac{1}{40}\ x \} \quad \forall \ 0\le x \le 20, \ x \in \mathbb {N} \end{aligned}$$
(16)

Regarding the setting of the threshold \(v_{SM}\), the same set of conditions are applied also for \(v_{OV}\). A discrete set \(V_{SM}\) is created in the same way of \(V_{OV}\) and each \(v_{SM} \in V_{SM}\) is considered to implement the analysis. For both, values larger than 0.5 are not taken into consideration as, in principle, they go in the opposite direction of the general objective of the present work, that is to reduce redundancyFootnote 16.

As one value for R, and 21 values for \(v_{OV}\), and 21 values for \(v_{SM}\) are considered, 441 different \(\mathscr {F}_{R,v_{OV},v_{SM}}\) are finally computed. Each of these final sets of RI masks constitute a soft partitionFootnote 17 of the system A. In Fig. 1a, the 441 obtained \(\mathscr {F}_{R,v_{OV},v_{SM}}\) are illustrated in a three-dimensional space describing the number of agents included (up to the total number of agents included in A), the percentage of agents belonging to more than one mask (up to the number of agents overall included in the partition), and the number of masks included in the corresponding partition.

Fig. 1.
figure 1

(a): Colored dots represent the final partitions obtained, i.e. all the \(\mathscr {F}_{R,v_{OV},v_{SM}}\) resulting from the possible combinations of the three parameters \(R=24,v_{OV} \in V_{OV}\) and \(v_{SM} \in V_{Sm}\), as described in Eq. (16). The y-axis describes the percentage of agents that, in the corresponding \(\mathscr {F}_{R,v_{OV},v_{SM}}\), belong to more than one mask (up to the number of agents that belong to at least one mask). The x-axis describes the percentage of agents that belong at least to one mask (up to the total number of agents included in the initial considered set A). The z-axis describes the number of masks that are present in the corresponding partition. The color of the 441 dots is in accordance with the % of overlapping agents. Small grey asterisks indicate the 47 partitions that include at least the 60% of agents of A, and that have less than the 70% of overlapping agents. Big colored dots are projected on the lateral and on the bottom faces of the cube delimiting the three-dimensional space. (b): Bipartite graph representing affiliations of agents (of set A) in the specific final set of RI masks \(\mathscr {F}_{24,0.325,0.225}\) (indicated in the 3D representation on the left, with a darker grey asterisk). Grey circular nodes represent agents, and blue squared nodes represent RI mask. The width of edges is proportional to the \(t_{CI}\) of the mask. (Color figure online)

7.3 Exploration of the Results

As represented in Fig. 1a, the considered combinations of the three parameters lead to different \(\mathscr {F}_{R,v_{OV},v_{SM}}\). Even though currently evaluation on the parameters’ space is not effectuated, the choice of the value to be considered is attributed with an a-posteriori unbiased procedure. In the present work, only those partitions including (i) at least 60% of the agents of the initial set A, and (ii) having less than 70% of agents belonging to more than one community, are considered. The parameters’ space is narrowed in order to address two objectives, which both concern the readability of the final representations of the system. These objectives are: (i) to consider partitions in which a large part of the initial system is analyzed, and (ii) to avoid the selection of partitions in which extreme overlapping of the detected subsets prevents a simple interpretation of the system. Statistics regarding the feature of the single masks are not taken into account, and a-priori biased considerations on the values of \(v_{OV}\) and \(v_{SM}\) are not made. Currently, the parameters’ space is not explored with a standardized method. However, the parameters are not selected based on the properties of the single masks, so as to avoid bias.

Based on the aforementioned conditions, 47 partitions (up to 441) are identified. These partitions are indicated with grey asterisks in Fig. 1a. In order to proceed with the exploration of the first results provided by the methodology, the presence of similar features within all the groups of 47 partitions is suggested. Currently, only one is heuristically selected, namely the partition with \(v_{OV}=0.325\) and \(v_{SM}=0.225\), which is indicated with a black asterisk in Fig. 1a. The corresponding set of masks, i.e. the masks included in \(\mathscr {F}_{24,0.3,0.075}\), is intended as a weighted bipartite graph, as represented in Fig. 1b. The agents involved, represented by grey circular nodes, are connected to the RI masks, represented by blue squared nodes, in which they are included, and the weight of their connection is based on the value of the \(t_{CI}\) of the masksFootnote 18. This partition is composed by 34 masks that overall include 298 agents of the initial set A. The network is consisted of 6 components, and 54 agents are not included in any mask. The 5 masks with the highest \(t_{CI}\) (the ones with the widest edges in Fig. 1b) include agents which participated in few projects, with behavioral profiles characterized by few changes over time. The reason is that these masks are identified as highly integrated as the activity of the agents involved is almost constant. Although low levels of entropy are generated, given that the activity of the involved agents is close to minimum, they can cannot be considered as the most relevant subsets. As these 5 not conducive masks generate independent components, the ongoing analyses are focused on the remaining 29 masks, which determine the largest component of 222 agents.

After the computation of the weighted betweenness centrality, the first results suggest a modification in the rank of centrality of the nodes. Although in the real-observed network, where agents are connected together if they co-participated in projects, the centrality of agents is related to the number of projects in which they participated, in the resulting network of RI masks this does not apply. More specifically, in the largest component of the one-mode projection of the weighted bipartite graph determined by the final set of masks \(\mathscr {F}_{24,0.325,0.225}\), the following elements are emerging: (i) nodes with the largest number of participations in projects appear to be close to each other in one periphery of the network; (ii) nodes with the smallest number of participations in projects appear to be close to each other in the opposite periphery of the network (with respect to the nodes with large number of projects); (iii) nodes with an average number of participations in projects appear to be very central; (iv) nodes with a high number of participations in projects and nodes with few participations in projects present few direct connections between them; (v) the shortest paths between very active nodes and scarcely active nodes (in terms of participations in projects) pass through agents with average activity.

The centrality ranking that the can be inferred after these initial results reveals an entire change with respect to the observation in the original network of participation in projects. As the RI methodology allows the investigation of the joint integration of agents’ dynamics, these first insights suggest that the agents with average number of activities, that now are the most central, harmonize the very intense activity of the nodes with many participations, namely the most central in the network of projects, with the scarce activity of those agents that participated in few projects. While the structure of the observed network of participations indicates that one of the most important and recognized laws of real complex network is respected, i.e. preferential attachment, the analysis of the functionality reveals insights that suggest new interpretations. These insights will be addressed in future research. Currently, because of the tests on the 447 considered partitions, observation do not suggest contradictory indications.

8 Conclusions

As physical order is addressed as a key dimension to the comprehension of the operation and the evolution of socio-economic complex systems [3], the main aim of this research is to contribute in the development the analysis of the entropy of joint behavioral time dynamics characterized by discontinuity, e.g. interactions. The objective of this work is to facilitate the implementation of a methodology that detects functional meso-structures with information theories [11,12,13]. In addition, the present work attempts to facilitate the implementation of entropy-related methods in the field of social sciences, and in particular in the analyses of socio-economic dynamic complex networks. The RI algorithm is extended with the PoSH-CADDy three-step methodology so as to reduce redundancy issues. The proposed approach is implemented in a real-world dynamic network (economic agents participating to Region Tuscany Network Policies from 2000–2006) consisted of \({\approx }350\) agents, where the proportion between the number of agents and the number of instants is \({\approx }6{:}1\). In a complex dynamic network where the number of time instants is considerably lower than the number of involved agents, the proposed procedure accomplished to successfully detect a final set of 34 RI masks representing 34 groups of agents, whose behaviors are considered as integrated, namely not random. For the scope of this study, the focus is set in those partitions with a minimum percentage of agents included in at least one mask, and without too many overlaps among masks. The revealed ranking of the nodes’ centrality appears to be substantially different from the one observed in the network of participations in projects.

In the perspective of this research are (i) the development of analytic models to statistically describe agents’ characteristics in relation to the topology of the network of RI masks, (ii) the analysis of partitions obtained by combinations of the presented parameters of the methodology, (iii) the implementation of the methodology in other case studies, (iv) and the implementation of the methodology based on the edges’ activation over time, instead of agents’ statuses, as system variables.