1 Introduction

In general, a social network is a structure formed by nodes (actors) and edges (interactions) used in studies of the relationships between individuals, groups or organizations. Focused essentially on topological structure, social networks studies apply a set of methods and measures to identify, visualize and analyze social networks looking for patterns of interactions and their implications (Newman 2001b, a).

In several networks, it is common to observe that actors tend to have affinities or similarities (attributes) with their peers. According to Crandall et al. (2008) there are two mechanisms of reasons for this, for example, actors can modify their behavior to make them more in line with the behavior of their peers, a process known as social influence (Friedkin 2006). Another distinct reason, an effect termed homophily, is that actors tend to form relationships with others who are already like them. In other words, in homophily, individual characteristics drive the formation of links, while in social influence, the links existing in the network serve to engage actors’ characteristics.

Kim and Altmann (2017) mention that the nature of homophily is shown in many empirical and theoretical studies. The study of these authors also concluded that homophily affects network formation. Homophily is the term used for the preference of actors to connect with other actors who share common attributes (McPherson et al. 2001). In studies on homophily, we seek to know if the nodes of a network disproportionately establish links with others that resemble them in some way, that is, we want to verify the occurrence of a higher incidence of relations between actors that have similar attributes.

However, actors can belong to many associative groups simultaneously, with various levels of affiliation, and distinct disjoint groups rarely exist on a large scale in many empirical networks (Leskovec et al. 2008). (Saha et al. 2014) also mentions that people participate in a wide variety of groups. In addition, Lee and Brusilovsky (2017) point out that society is currently goaded by information and knowledge, what generates new homophily dimensions. Information, knowledge and some attributes such as economic blocks in commercial networks; communities on social networks such as Facebook, Twitter, among others; and other attributes linked to behaviors, tastes and attitudes generate non-disjoint groups. Currently, publications that use the EI index as a measure of homophily are concentrated in disjoint or mutually exclusive groups. Situations in which network actors are present in more than one group are not commonly explored. One of the barriers found in the analysis of non-disjoint groups is the absence of a measure, since the EI index is defined for disjoint groups (Andrade and Rêgo 2019).

Motivated by this gap, Andrade and Rêgo (2019) suggest a method that generalizes the EI index developed by Krackhardt and Stern (1988). This method quantifies the relational structure within and between groups that encompasses the analysis of both disjoint and non-disjoint groups. Furthermore, we observe that the process of social influence has already been studied in the context of fuzzy groups (Li and Wei 2019; Khalid and Beg 2019).

In this context, the objective of this work is to expand the generalized metric suggested by Andrade and Rêgo (2019), adapting it to also cover groups where the nodes present several levels of affiliations, fuzzy groups. We can highlight as advantages of study, for example, the ability to address networks that analyze political behavior, studying relationships between voters with different positions in the political spectrum and networks of friendships with bilingual speakers, analyzing the relationships between speakers with different levels of language fluency. In our work, we analyzed two networks. A co-authorship network formed by researchers with a Ph.D. in production engineering, where the time of Ph.D. completion, defined the fuzzy groups. The other network is formed by trade relations between American countries, in which we use the Human Development Index (HDI) to form fuzzy groups.

This paper is organized as follows. In Sect. 2, we briefly present the EI index proposed by Krackhardt and Stern (1988), which measures homophily in networks with disjoint groups. Then, in Sect. 3, we present our measure, which is a generalization of the current EI index, encompassing fuzzy groups. Two applications of the proposed measure are made in Sect. 4. Finally, we discuss the results of the applications in Sect. 5 and present conclusions.

2 EI index

The EI index, proposed by Krackhardt and Stern (1988), essentially quantifies the relational structure within and between groups (Everett and Borgatti 2012; Krackhardt 1994). The EI index was implemented in the popular social network analysis package UCINET (1999) as a measure for homophily. This measure analyzes the tendency of people to connect with others similar to them, as well as social insertion, i.e., how a node or group of nodes decides to connect to other nodes in a network Hanneman and Riddle (2005).

Homophily is one of the most widespread and robust trends in human interaction, describing how people tend to seek out and interact with others who are more like them - often characterized as “birds of a feather” named by McPherson et al. (2001). As a mechanism of social relations, it can explain the group composition in terms of social identities ranging from ethnicity to age (Lazarsfeld et al. 1954). Indeed, ethnicity, along with geography and kinship, are the main motivating factors behind homophilic practices (McPherson et al. 2001). Everett and Borgatti (2012) are among the researchers who treat the EI index as a measure of homophily and heterophily, where smaller values (internal connections) indicate greater homophily and larger values (external connections) indicate lower homophily or greater heterophily. The EI index as a measure of homophily is essentially used to quantify the individuals’ propensity to interact with similar actors (Burt 1991; McPherson et al. 2001). In addition, the EI can be used as a segregation measure (Sweet and Zheng 2017), where segregation is defined as the “unequal” distribution of two or more groups of people in different units or social positions (Bojanowski and Corten 2014).

The EI index is defined as the difference between the intergroup and intragroup ties divided by the total number of ties for normalization. It is a simple and attractive measure of homophily because it does not depend on the density of the network (Everett and Borgatti 2012). Formally, the EI index is given by

$$\begin{aligned} EI{\text { index}} = \frac{{{\text {EL}} - {\text {IL}}}}{{{\text {EL}} + {\text {IL}}}}, \end{aligned}$$
(1)

where EL is the number of external links (links between nodes belonging to different groups); IL is the number of internal links (links between nodes belonging to the same group). The EI index ranges from -1 (all bonds are internal) to +1 (all bonds are external). The index can be calculated for the entire network, for each group or for each individual actor.

Although commonly used in an unweighted network, some authors like Andrade and Rêgo (2018) and Danchev and Porter (2016) have also used the EI index in weighted networks. In weighted networks, the EI index is calculated using the weight of the edges, this way EL is the sum of the weights of the edges that connect different cells of the partition and IL is the sum of the weights of the edges that connect actors of the same cell of the partition. As with the unweighted network, the EI index for weighted networks assumes values between −1 and +1. Generally, the weight of an edge represents the frequency or strength of the relationship. Therefore, when the value of the EI index approaches −1, it means that the internal relations are stronger or more intense. As the index approaches +1, it shows that external relations are stronger or more intense.

In recent years, the inclusion of numerical attributes has been observed in the analysis of social networks. Attributes are resources of nodes and are used to give weight to them, representing their importance or contribution in the network (Andrade and Rêgo 2018; Liu et al. 2015; Benyahia and Largeron 2015). In this work, we will also consider the nodes’ weights and insert them in the topological structure of the network. For this, we use the method proposed by Andrade and Rêgo (2018). By this method, the edge weight is equal to the frequency or strength of the relationship between two nodes multiplied by the average weights of the nodes. The intuition is that, in cases where information about quantitative features of nodes is available, the weight of a link should not only depend on the strength of the connection (original edge weight), but also on the average importance of the connected nodes. Formally, if \(v_i\) is the weight of node i and \(w_{ij}\) is the original weight of the link between nodes i and j, then, including the nodes’ weights, the new edges’ weights are given by

$$\begin{aligned} z_{\text {ij}}=w_{\text {ij}}\frac{v_i+v_j}{2}. \end{aligned}$$
(2)

The inclusion of the nodes’ weights contributes to a more efficient analysis of the network by combining factors inherent to the network with external factors (Andrade and Rêgo 2018). External factors attribute a certain “status” to individuals in the network and through the EI index it is possible to verify whether this status also influences the formation of relationships. However, this conclusion is only reached by comparing it with the EI index without considering external factors.

3 EI index: fuzzy case

Every day, when describing certain phenomena (characteristics), we use degrees that represent qualities or partial truths. As an example, let us consider the group of elderly people. There are at least two approaches to mathematically formalize this set. The first, distinguishing from which age the individual is considered elderly. For example, \(A=\{x:x\ge 65\}\), where x is an individual age measured in years. In this case, the set is well-defined. The second, less conventional, occurs in such a way that individuals are considered elderly to a greater or lesser extent, that is, there are elements that would belong more to the elderly class than others. This means that the younger the individual, the lower his or her degree of belonging to that class. Thus, we can say that individuals belong to the elderly class, with greater or lesser intensity. Mathematically, we call fuzzy sets the sets to which the elements have degrees of membership. As opposed to the traditional sets where elements belong or not to them, to define a fuzzy set, B, we need to specify a membership function, \(\mu _B: \Omega \rightarrow [0,1]\), where \(\mu _B(w)\) represent for an element w of the universe, \(\Omega \), to what extent w belongs to B and higher values of \(\mu _B(w)\) indicate a higher membership degree. The formalization of fuzzy sets was presented by Zadeh (1996) as an extension of the classical notion of sets.

To explore cases of fuzzy groups, we have developed a new metric to obtain the EI index, which is an adaptation of the metric proposed by Andrade and Rêgo (2019) to generalize the original EI index measure for use with overlapping groups.

Let \({{{\mathcal {A}}}}\) be the set of all attributes for nodes in a social network with n nodes. For \(X\in {{{\mathcal {A}}}}\), let \(\mu _{X}(v_i)\) be the membership level of node \(v_i\) to a given group, \(0\le \mu _{X}(v_i) \le 1\). Moreover, for a generic set of nodes, S, consider the following sets of indices \({{{\mathcal {I}}}}(S)=\{i:v_i\in S\}\) and \({{{\mathcal {J}}}}^i(S)=\{j:(v_j\in S \text{ and } j>i) \text{ or } (v_j\notin S \text{ and } j\ne i)\}\). Thus, the number of external and internal links for a generic set of nodes, S, is given, respectively, by:

$$\begin{aligned} EL(S)=\sum _{i\in {{{\mathcal {I}}}}(S)}\sum _{j\in {{{\mathcal {J}}}}^i(S)}x_{ij}(1-\max _{X\in {{{\mathcal {A}}}}}\{\mu _X(v_i)\mu _X(v_j)\}) \end{aligned}$$
(3)

and

$$\begin{aligned} IL(S)=\sum _{i\in {{{\mathcal {I}}}}(S)}\sum _{j\in {{{\mathcal {J}}}}^i(S)}x_{ij}\max _{X\in {{{\mathcal {A}}}}}\{\mu _X(v_i)\mu _X(v_j)\}, \end{aligned}$$
(4)

where in the unweighted case \(x_{ij}\) is 1 or 0 depending on whether there is or not a link between nodes \(v_i\) and \(v_j\), in the case of only edge weights \(x_{ij}=w_{ij}\) and in the case of edge and node weights \(x_{ij}=z_{ij}\).

Alternatively, for \(X\in {{{\mathcal {A}}}}\), we can define the number of external and internal links for the group of nodes, \(S_X\), which has attribute X, respectively, as follows:

$$\begin{aligned} EL(S_X)=\sum _{i=1}^{n}\sum _{j=1}^{n}x_{ij}\mu _X(v_i)(1-\mu _X(v_j)) \end{aligned}$$
(5)

and

$$\begin{aligned} IL(S_X)=\sum _{i=1}^{n}\sum _{j> i} x_{ij}\mu _X(v_i)\mu _X(v_j), \end{aligned}$$
(6)

where \(x_{ij}\) is defined exactly as before.

Since membership functions by definition assume values between 0 and 1 and the definitions of external and internal links involve products of membership functions, in order to avoid overestimating the external links, we recommend the use of trapezoidal membership functions. In order to obtain the trapezoidal membership functions, we suggest performing the following steps:

  1. (i)

    Determine the highest value before which the degree of membership is known to be null.

  2. (ii)

    Determine the lowest value from which it is known for certain that the degree of membership is null.

  3. (iii)

    Determine the lowest value with degree of membership 1.

  4. (iv)

    Determine the highest value with degree of membership 1.

Fig. 1
figure 1

Social network with fuzzy groups of nodes

To better explain our proposed method, we present here a simple example to explain how the new metric works on a specific network. Suppose there is a network with four nodes that belong with different membership levels to two groups, A and B (as show in Figure 1). In the network, let us consider calculating the EI index for the set of nodes \(\{1, 2\}\). Note that nodes 1 and 2 have no connection and that node 0 is connected to both of them. Disregarding the edges’ and nodes’ weights, we have \(x_{10} = 1\) or \(x_{01} = 1\) and \(x_{20} = 1\) or \(x_{02} = 1\). Moreover, \(\max _{X\in \{A, B\}}\{\mu _X(0)\mu _X(1)\}=\max \{0.65\times 0.80, 0.75\times 0.60\}=0.52\) and \(\max _{X\in \{A, B\}}\{\mu _X(0)\mu _X(2)\}=\max \{0.65\times 0.65, 0.75\times 0.15\}=0.4225.\)

Thus, \(EL(\{1, 2\})=(1-0.52)+(1-0.4225)=1.06\) and \(IL(\{1, 2\})=0.52+0.4225=0.94\). Therefore, \(EI(\{1, 2\})=\frac{1.06-0.94}{1.06+0.94}=0.06\).

Now let us consider calculating the EI index for the group A. We have to consider the following edges \(x_{01}\) or \(x_{10}\), \(x_{02}\) or \(x_{20}\) and \(x_{03}\) or \(x_{30}\). Thus,

$$\begin{aligned} EL(S_A)= & {} \mu _A(0)(1-\mu _A(1))+\mu _A(1)(1-\mu _A(0))\\&+\mu _A(0)(1-\mu _A(2))\\&+\mu _A(2)(1-\mu _A(0))+\mu _A(0)(1-\mu _A(3))\\&+\mu _A(3)(1-\mu _A(0))\\= & {} 0.65(1-0.80)+0.80(1-0.65)\\&+0.65(1-0.65)\\&+ 0.65(1-0.65)+0.65(1-0.25)\\&+0.25(1-0.65)\\= & {} 0.13+0.28+0.2275+0.2275+0.4875\\&+0.0875=1.44 \end{aligned}$$

and

$$\begin{aligned} IL(S_A)= & {} \mu _A(0)\mu _A(1)+\mu _A(0)\mu _A(2)+\mu _A(0)\mu _A(3)\\= & {} 0.65\times 0.80+0.65\times 0.65+0.65\times 0.25=1.105. \end{aligned}$$

Therefore, \(EI(S_A)=\frac{1.44-1.104}{1.44+1.104}=0.13\).

Table 1 displays the results for the graph shown in Fig. 1. It is easy to verify that the proposed metric is a generalization of the EI index proposed in Krackhardt and Stern (1988) in the sense that if groups are disjoint and the membership functions are either 0 or 1, then it coincides with (1).

Table 1 EI index fuzzy groups example

4 Homophily in co-authorship and trade networks

In this section, we apply the proposed method in two networks studied in previous publications. These networks present the fundamental element for our approach, which is the presence of fuzzy groups, in addition to information about the nodes’ weights. As a means of comparison, we also analyze the cases of disjoint (Everett and Borgatti 2012) and non-disjoint (Andrade and Rêgo 2019) groups. In this way, the EI index will be obtained for 4 situations: without considering the weight of edges and nodes, unweighted (UW); regarding only the nodes’ weight, Z_unweighted (ZU); considering only the edges’ weight, weighted (W); taking into account both weights, Z_weighted (ZW).

To evaluate whether the EI index for a given group is compatible with what is expected when connections occur randomly, i.e., without preference of members for external or internal relations, for the unweighted and the Z_unweighted cases, we calculate the expected EI index for each one of the analyzed cases considering the average of 5000 randomly generated binomial graphs with the same density and size as that of the original graphs. We also added a probability, p-value, which expresses how unlikely it is to obtain an EI index at least as extreme as that observed in the randomly generated binomial graphs. We considered one-sided p-values calculated by the relative frequency of times that the simulated EI obtained a value greater (resp., smaller) than or equal to the observed EI, when the expected EI is smaller (resp., larger) than the observed one.

4.1 Data

To implement the proposed EI index, we use data from two real networks. Next, we give some details about these networks.

  1. (i)

    Co-authorship PQ: The PQ network is a co-authorship network among researchers in the field of Production Engineering in Brazil who had a Productivity Research scholarship from the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) in 2015. It has 124 nodes and 131 edges. The network is undirected and the edge weights represent the number of publications co-authored by a given pair of researchers in the period of 2005 to 2014 (Andrade and Rêgo 2017).

  2. (ii)

    Trade of American Countries: The trade network among American countries is formed by 30 countries and 356 edges. This network was created from the international trade network developed by (Andrade and Rêgo 2018) which includes 178 countries from all continents, forming a single main component with 10,419 edges. The network is undirected and the edge weights represent the average of export and import trade transactions between a pair of countries during 2015.

4.2 PQ network

First, we show how the arbitrary choice of disjoint groups, according to the Ph.D. completion time, affects the EI index of these groups. We delimit three cases of disjoint groups (T1, T2 and T3) varying the limits of the groups, Table 2, in the fuzzy regions, Table 3. Figure 2 shows the EI index for the entire network, for each of the arbitrary limits. As expected, the result is heavily dependent on these limits.

Table 2 Criteria for defining disjoint groups in the PQ network

The definitions of the groups formed according to the Ph.D. completion time for the disjoint, non-disjoint and fuzzy case, followed the criteria in Table 3. For the disjoint case, we consider the intermediate case T2.

Table 3 Criteria for defining groups in the PQ Network

We use the researchers’ h-index as the node weights. The h-index is a measure that combines, in a simple way, the number of publications and the impact of publications and is given by the maximum value h such that a researcher has published h works and each of these works has been cited h or more times Hirsch (2010).

Figure 3 shows how the relationships between researchers occur. In general, most nodes in the non-disjoint case have an EI index of −1 (60%). In the fuzzy case and in the disjoint case, the nodes present similarity in relation to the proportion of EI index higher and lower than zero; however, in the fuzzy case, the distribution of the EI index is more uniform.

Fig. 2
figure 2

EI indexes for the whole PQ network

Figure 4 shows the EI index for the entire network. In general, when the nodes belong to non-disjoint groups, it is observed that the EI indexes are smaller, with a predominance of in-group relationships. On the other hand, when the groups are disjoint, the network has higher but still negative EI indexes. This means that, on a global level, there is a high level of cooperation between researchers from the same group. As for the strength of connections, it is observed that in the W network there are the lowest EI indexes and in the ZW network the highest EI indexes. The first result indicates that the relationships are stronger between researchers from the same group and the second indicates that researchers who connect to researchers from other groups tend to link to researchers with higher h indexes. It is worth mentioning that the negative EI indexes of the network, revealing a predominant in-group relationship, do not differ significantly from the result obtained in a simulated random network, since the p-values are all greater than 0.05.

Fig. 3
figure 3

Distribution of the EI index values in the PQ Network. a disjoint, b non-disjoint and c fuzzy

Fig. 4
figure 4

EI indexes for the whole PQ network

Fig. 5
figure 5

EI indexes for experience level attributes in the PQ network

The analysis of the EI index of the experience level groups is shown in Fig. 5. In general, when nodes belong to non-disjoint groups, it is observed that the EI indexes are smaller. In the case of disjoint and fuzzy groups, the EI indexes are close, with the EI indexes of the disjoint case a little higher. The experienced group’s EI indexes are negative, especially in the non-disjoint case. This shows that the internal connections of this group are larger than the external ones. The youth and senior groups have a positive EI index, with the youth being superior to seniors. This shows that the external relations of these surpass the internal ones. Therefore, we can conclude that the experts cooperate with each other while young and senior Ph.D. are more open to cooperating with other groups. It is worth mentioning that the EI indexes obtained do not reveal a tendency towards homophily or heterophily, as they do not differ significantly from the results obtained by the random simulated network, since the p-values are all greater than 0.05. Note that the edge weighting affected more the EI index of the disjoint case, making the relationships more heterogeneous. This is most noticeable in the case of experienced groups.

Fig. 6
figure 6

EI indexes for scholarship level groups considering the experience level as class attributes in the PQ network

We also analyzed the behavior of groups of researchers with the same level of scholarship in relation to the experience level group attributes. The scholarship level in order of importance and the total number of researchers are: 1A (8%), 1B (5%), 1C (8%), 1D (19%) and 2 (59%). The analyses of the EI index of these groups are shown in Fig. 6 for the cases of disjoint, non-disjoint and fuzzy groups, and studying the UW, ZU, W and ZW networks. In general, when nodes belong to non-disjoint groups, it is observed that the EI indexes are smaller, with in-group relationships predominating. On the other hand, when the groups are disjoint or fuzzy, the network has higher EI indexes.

As for scholarship levels, there is a different behavior of the EI indexes for the different connection types, weighted or not. Level 1A has the highest EI indexes in the unweighted network, without or with the inclusion of the node weights and in the weighted network considering the node weights. Level 1A, the highest level of the scholarship, concentrates the most productive and influential researchers in the research area, being composed of 10 exclusively senior researchers and 2 exclusively experienced researchers. Although most are seniors, the in-group relationship is predominant in the non-disjointed case and external relationships are more common when the group is fuzzy or disjoint. Level 1A EI indexes are all negative in the weighted network. Level 1C, an intermediate scholarship level, also does not include young researchers. In the weighted network, with and without node weights, as well as in the unweighted network (only in the non-disjoint case), the EI index of the level 1C is the smallest and negative. Therefore, for researchers at this level, most connections occur between researchers in the same experience level group. It is noteworthy that the EI indexes obtained do not reveal a tendency towards homophily or heterophily, as they do not differ significantly from the results obtained by random simulated networks since the p-values are all greater than 0.05.

4.3 Trade of American countries network

We use the Human Development Index (HDI) to form groups and first show how the arbitrary choice of disjoint groups, according to the HDI, affects the EI index of these groups. We delimited three cases of the disjoint groups (T1, T2 and T3) varying the thresholds of the groups, Table 4, in the fuzzy regions, Table 5. Figure 7 shows the EI index for the entire network, for each of the arbitrary thresholds. As expected, the result is heavily dependent on these limits.

Table 4 Criteria for defining groups in the Trade network

The definitions of the groups formed according to the HDI for the disjoint, non-disjoint and fuzzy case, followed the criteria in Table 5, where the intermediary case T2 was used for the disjoint case.

Table 5 Criteria for defining groups in the Trade network

Figure 8 shows the EI index at the individual level of the 30 countries. In general, countries have positive EI indexes, that is, intergroup relations higher than in-groups. In the non-disjoint case, it is possible to notice that some countries predominate in-group relations. The in-group relationship is also more visible when the network is unweighted.

Figure 9 shows the EI index for the entire network. In general, when nodes belong to non-disjoint groups, it is observed that the EI indexes are smaller. On the other hand, when the groups are fuzzy, the network has higher EI indexes. The EI indexes are positive, except the EI index in the case of non-disjoint groups in the weighted network. This indicates that, at a global level, trade takes place between countries of different HDI groups. The predominant intergroup relationships do not differ significantly from the result obtained by random simulated networks since all the p-values are greater than 0.05.

Fig. 7
figure 7

EI indexes for the whole Trade network

Fig. 8
figure 8

Distribution of EI index values in the trade network for a disjoint, b non-disjoint and c fuzzy cases

The analysis of the EI index of the HDI groups is shown in Fig. 10. In general, the low and medium groups have the highest EI indexes, close to 1. The countries of these groups have intergroup relations higher than in-groups, the EI indexes are statistically significant, that is, these groups are prone to heterophilia. The group with high HDI has the lowest EI indexes in the unweighted network, being the one with the highest in-group relationship, but the EI indexes increase significantly in the Z_Unweighted, weighted and Z_Weighted networks. Thus, the relationships are stronger with other groups in these networks. The group of countries with very high HDI has the lowest EI indexes in the weighted network, with and without the node weights, revealing a closer relationship between countries in the group. The EI indexes of the groups with high and very high HDI do not differ statistically from those presented by the random simulated network.

We also analyzed the behavior of groups of countries by region in relation to the HDI group attributes. The regional divisions are north, south and central, with 3, 12 and 15 countries, respectively. The analyses of the EI indexes of these groups are shown in Fig. 11 for the cases of disjoint, non-disjoint and fuzzy groups, and studying the UW, ZU, W and ZW networks. In general, when nodes belong to non-disjoint groups, it is observed that the EI indexes are smaller. On the other hand, when the groups are disjoint or fuzzy, the regions have higher EI indexes.

As for the regions, there is a behavior different from the EI index depending on the connection type, weighted or unweighted. The northern region has the highest EI indexes on the UW and ZU networks. The northern region’s EI indexes decrease in the weighted network, indicating that northern region have stronger relations with countries in the same HDI group. The southern region in the UW network has the lowest EI indexes, positive in the disjoint and fuzzy case, and negative in the non-disjoint case. In weighted networks, with and without node weights, the EI indexes are positive and higher in the southern region, indicating that the forces of relations are more intense between countries of different HDI groups. The EI indexes of the regions do not reveal a tendency towards homophily or heterophily, as they do not differ significantly from the EI presented in the simulated network with random relationships.

Fig. 9
figure 9

EI indexes for the whole Trade network

Fig. 10
figure 10

EI indexes for HDI groups in the Trade network

Fig. 11
figure 11

EI indexes for regional groups considering the HDI groups as class attributes in the trade network

5 Conclusion

In this work, we have proposed a new network measure, which is a generalization of the EI index to measure homophily in cases of fuzzy groups. Fuzzy groups are particularly important when actors may belong to many associative groups simultaneously and with various levels of affiliation. Therefore, for a better understanding of the structure of networks, the measure developed allows the analysis of multiple associations and different levels of association. We also show that incorporating node weights into the analysis can give us more insights into the homophily of relations.

We explored two networks with the new measure. In a co-authorship network, the Ph.D. completion time was used to form groups. In a commercial network among countries, we use the Human Development Index (HDI) to form groups. We obtain the EI index for the networks considering the cases of disjoint, non-disjoint and fuzzy groups, and analyzing different relational forces, unweighted, weighted, without and with node weights. As we have seen in these networks, the proposed measure allows expanding the analysis of social networks. Through a homophily analysis, it is possible to identify whether a certain group of nodes has a tendency to work together or not.

In general, it is clear that fuzzy groups generate more homogeneous cooperation or commercial relations. This was already expected due to the fact that the actors present multiple associations with the same degree of association, equal to 1. In the co-authorship network, we noticed that the researchers allocated as experienced are the ones that cooperate the most with each other. These relationships are favored because there are more experienced researchers. The smaller number of young and senior researchers also justifies the predominance of external relations by these researchers. In the trade network, we noticed that relations between countries with different levels of development are more common. In the case of the groups with low and medium HDI, we note that the EI index close to 1 is statistically significant, revealing the tendency towards heterophilia in these two groups, revealing their dependency on more developed countries.

In addition to the two examples of networks used to illustrate the measure, other networks also present actors that belong to different groups of attributes and that, due to the imprecision or limitations of the information, it is necessary to resort to the fuzzy system. Thus, we expect that many other studies may benefit from this measure.