Keywords

1 Introduction

In the social media era, a widely debated issue is whether the algorithmic bias emerging from digital platforms reinforces the need for confirmation of each individual (i.e., confirmation bias), fostering the radicalization of opinions and the emergence of echo chambers (henceforth ECs) [7, 16, 18]. Commonly speaking, an echo chamber can be considered as a closed system, insulated from rebuttal, in which beliefs are amplified and polarized by communication repetition. Unfortunately, there is concern that ECs might lead to several alarming episodes [15] such as hate speech, misinformation, and minority discrimination. Indeed, since debates, campaigns, and movements taking place on online platforms also resonate in the physical world, their effects should not be relegated only to the virtual realm. Accordingly, detecting and characterizing ECs is of utmost importance since it is the first step toward deploying actionable strategies to mitigate its effects.

For such reasons, a large body of scientific works [2, 5, 9, 10, 12, 14, 17] has addressed the issue of echo chambers detection, often focusing on social media online discussions around highly divisive topics. However, the ill-posedness of ECs definition, along with the absence of standard strategies to support their identification, has often led to conflicting and hard to generalize experimental results [7]. Giving a brief overview, quantitative analytical methods proposed in such studies can be classified into two different families: content-based and network-based. The former subset relies on the assumption that polarized environments are detectable by looking at the leaning of content shared or consumed by a user and analyzing its sentiment on the controversy, regardless of its interactions with others. For instance, [1] explores the US debate between Liberals and Conservatives on Facebook and Twitter, looking for partisan users—i.e., those sharing articles conforming to their political beliefs—while [2] also considers users’ exposure to crosscutting contents from the news feed or friends. On the other hand, the latter subset mainly focuses on finding clustered topologies in users’ interactions rather than on their content homophily. Dealing with it, the authors of [11], first define the conversational network of Facebook users discussing the 2014 Thai election and then partition it into well-knit communities from a topological point of view. Nevertheless, hybrid methodologies to detect ECs—e.g., taking into account both users’ ideology as well as their interactions with each other—also exist, as in [3] where the authors study online communications to understand when they resemble ECs, collecting several million tweets concerning twelve political and non-political issues. Authors infer users’ ideology relying on their follow to popular controversial accounts then define their interaction network via retweet. Similarly, [8], first estimates users’ leaning on political controversy based on the media slant that they share and consume and thus defines the debate network through the follow relationship. Indeed, one of the limitations of existing studies is that they primarily focus on open discussions not bounded to a specific time window in which users’ opinions—when observed—are already formed. Accordingly, ECs are often extracted from years-long online discussions centered around ongoing, evolving, and recurring hot topics (e.g., immigration, war...), thus making impossible to investigate when and how users find themselves trapped in ECs. Moreover, such studies often discard the temporal dimension collapsing the observed behaviors into a single, timeless snapshot describing the studied phenomena as a whole.

Here we propose a longitudinal analysis of ECs related to a controversial event: namely, the Twitter discussion around “Taking the knee" that emerged in Italy during the EURO 2020 football championship. The specificity of the selected phenomenon—i.e., being temporal bounded—allowed us to not only identify and characterize ECs at different topological scales (macro, meso, and micro level) but also to dynamically track and evaluate their formation process in time—from the beginning to the conclusion of a specific framing context—thus relating them to those events that linchpin the online discussion.

The rest of the paper is organized as follows. In Sect. 2, we overview the societal issues that arose around the EURO 2020. Section 3 discusses how we manipulate Twitter data in order to assess echo chambers’ existence and evolution over time. Then, in Sect. 4, we evaluate the presence of echo chambers in the debate, both considering their evolution in time and differences in scale. Finally, Sect. 5 concludes the paper with a discussion of results and directions for future work.

2 EURO 2020: Beyond the Sportive Event

The event that we want to explore from an echo chamber point of view concerns the 2020 UEFA European Football Championship (EURO 2020), i.e., a sporting event soon became a public theatre to debate racial issues. It all started when some teams participating in the championship decided to take the knee to show support for the Black Lives Matter (BLM) movementFootnote 1 and, as a consequence, the offline and online debate was monopolized about whether the players should or should not show their support for the movement.

The act of taking the knee has its origins with Martin Luther King, who knelt on one knee, praying, after many arrests occurred during a peaceful protest in Selma, Alabama in 1964. The gesture finds its way into the sports fields thanks to Colin Kaepernick, a football player, who in 2016 knelt during the performance of the American anthem, as a protest against racial discrimination suffered by black people.Footnote 2 In the context of EURO 2020, the gesture of taking the knee in support of the BLM movement started spreading thanks to the Belgium national football team who decided to kneel just before the kick-off of the matches. Such a stance was imitated by some national teams (e.g., Wales, England) and not supported by others (e.g., Hungary, Russia, Holland). In between those two opposite sides stands the Italian Football Federation (FIGCFootnote 3) position that did not give clear support to the movement but they left the players free to behave as they thought it was better. Accordingly, during their third match (i.e., Italy-Wales), five Italian players took the knee, while the others remained standing. This event triggered a heated discussion on Twitter where people took two sides: the players standing were either labeled as racist or on the contrary, the bastion of free-thinking against the “dictatorship of politically correct”. The hashtag #iononmiinginocchioFootnote 4 went immediately trending. Finally, the Italian players stated their final decision just minutes before their 4th match (round of 16, Italy-Austria): they would take the knee only if the players of the other team did the same, to sustain the opponents’ choices but not being supportive of the BLM movement itself.Footnote 5

3 The Online Debate: Will You Take the Knee?

Given the high polarization of opinions around EURO 2020 in Italy, we focus our analysis on the Italian scenario by assessing whether and how echo chambers are born, strengthened, and evolved during the seven football matches. We decided to select Twitter as a data source for such a purpose since the debate about taking the knee during the football event started spreading from the beginning. In this section, we discuss how we leverage Twitter data to infer user stances on the debate as well as define the interaction network between users. The data and the code used for this study are available on Github.Footnote 6

Dataset. Our Twitter data collection covers roughly one month—starting on June 10, the eve of the EURO 2020 opening match played by Italy and Turkey, and ending on July 13, two days after the final match, Italy-England. The conversations it encompasses gravitate towards a predefined set of hashtags we used to filter our collection pipeline, all referring to Italy’s played matches, to the competition in general, and to taking the knee. We collect a total of 38,908 tweets made by 16,235 different users.

Fig. 1.
figure 1

Users Profiling. (a) Bottom: distribution of users who wrote at least 5 tweets (to ensure that the average activity of these users covers most of the events) with respect to the opinion \(C_{u}\). Top: boxplots of the distribution of non-neutral opinions, from 0.5 to the extreme limit (\(\pm 3\)). (b) The line graph (left axes) shows the number of pros, against, and neutral users who posted at least 1 tweet (\(\ge 1\)), at least 2 tweets (\(\ge 2\)), and so on. The bar graph (right axes) shows the median of the number of tweets posted by users.

Opinions Estimate. To classify the users’ opinions, we chose a hashtag-oriented approach [6, 19]. We classified, by performing a manual annotation, 2304 hashtags used in the dataset. We associate every hashtag with a numerical value. After several attempts leading to similar results, we chose the following values: \(\pm 3\) if the hashtag express a clear position, cons (+), e.g. #iononmiinginocchio, or pro (-), e.g. #iomiinginocchio,Footnote 7 on taking the knee during EURO 2020; \(\pm 1\) if the hashtag is close to the faction cons (+), e.g. #noblm, or pro (-), e.g. #BlackLivesMatter, even though not directly related to EURO 2020; 0 for the neutral and/or not relevant hashtag, e.g. #vacciniamoci.Footnote 8 Only 15.2% of the classified hashtags are not neutral: 180 support taking the knee, and 170 are against it. Among them, only 14 and 40 hashtags explicitly refer to kneeling, respectively for the supporters and the opponents. For every tweet, we set its value of classification \(C_t\) by computing the average value of the classification of non-neutral hashtag \(C_h\) in it. For every user u we found its classification \(C_u\) by averaging the classification values of their tweets. Looking at the distribution of users’ opinions \( C_{u} \) in Fig. 1a, we observe the typical distribution of polarized issues with a neat prevalence of extreme values and a small number of users having a neutral position. Further, the boxplots also highlight the asymmetry with respect to \( C_{u} = 0 \) of this distribution: for \( C_{u} \ge 0.5 \) users have on average more extreme opinions, while for \( C_{u} \le -\,0.5 \) there are more moderate values.

Further, we can obtain additional insights by looking at the users in the dataset. We find 7949 in favor of kneeling (\(C_{u} \le -\,0.5\)) and 5970 against it (\(C_{u} \ge 0.5\)). Although the proponents outnumber the opponents by about 2000 users, the scenario changes when considering the users who posted more than one tweet during the event: the numerical difference not only decreases but overturns. The opponents slightly exceed those in favor (Fig. 1b), suggesting that the proponents might be less involved with a milder opinion regarding the kneeling act in itself (Fig. 1a). This result confirms what has already emerged in the hashtag classification operations: opponents are more likely to go explicitly against taking the knee, while proponents, rather than supporting the kneeling itself, put emphasis on the ethical and moral reasons behind it. Indeed, while those against kneeling used the hashtag #iononmiinginocchio more than 14,000 times, the counterpart tweeted the hashtag #iomiinginocchio less than 2000 times, preferring the hashtag #blacklivesmatter instead. In addition to the main hashtags, we notice a consistent use of the hashtag #razzismoFootnote 9 on both sides, highlighting the ethical implication of the issue. Further, the hashtags used by supporters are mostly slogans against racism and Nazi-fascist dictatorships, while the opponents’ ones often refer to a wide range of current political and social issues (e.g., immigration, LGBTQ+ rights, Euroscepticism).

Static Network modeling. We built a weighted undirected graph, where the nodes are the users, and the edges represent their interactions (i.e., retweet, mention, quote, and reply). We chose to distinguish between “active" (or wilful) interaction (\(w=1\)), i.e. retweets or quote, and “passive" (not wilful) interaction (\(w=0.5\)). The network is composed by \(N=15,378\) nodes and \(L=36,496\) edges. The degree distribution suggests a power law description \(p(k)=CK^{-\gamma }\) and through a fit algorithm in the regime region, we obtain \(\gamma =2.12 \pm 0.04\), meaning that a scale-free regime well describes our network. Measuring different kinds of centrality gives us a complete picture of its most influential nodes. Table 1 shows nodes with the highest score for various centralities. Among them, we can distinguish two groups of people. The first is composed by quite popular accounts, mostly journalist, very productive on Twitter (e.g. @PBerizzi, @Giorgiolaporta); the second one by prominent figures, who did not join the debate directly on Twitter, although they explicitly took sides on other media, such as television or newspapers (e.g. @FedericoRampini, @EnricoLetta).

Table 1. Static Network. Top 5 nodes for different centrality scores.

Temporal Network modeling. Finally, to better highlight the evolution of the online discussion, we broke down such flat network into seven temporal-bounded snapshots, each corresponding to one of the matches played by Italy. Such modeling allowed us to longitudinally estimate and discuss ECs, as will emerge in the forthcoming section. As shown in Table 2, the network considerably grows between G3 and G4, increasing both the number of nodes and links.

Table 2. Network Snapshots. For each snapshot, match phase, reference and result, whether people knelt (N: no, Y: yes, P: partially), temporal coverage, and the number of nodes and edges.

4 Echo Chambers: From Global to Local

To provide a comprehensive view of the online discussion under analysis, we measure and characterize echo chambers by focusing on different topological levels. In particular, we discuss them by observing patterns emerging at the macro (network-wide), meso (community-wide) and micro (node) level. While the first method can be exploited to identify well-separated echo chambers at an aggregated level, the meso-scale approach allows us to identify multiple ECs by taking into account differences within certain areas of the network, e.g., there can be more than one EC with the same ideology. The third method, instead, outputs the level of echo each individual node is subject to.

Fig. 2.
figure 2

Macro-scale ECs. a Visualization of the time-aggregated representation of our network, with spatial separation of the two echo chambers. b The line graph (left axes) shows the percentage of nodes in the network connected to hubs. The bar graph (right axes) shows the number of hubs present at different time intervals in echo chambers R (in red) and B (in blue).

Macro-scale. To qualitatively assess the presence of macro-scale echo chambers, we visualize the time-flattened network with the Force Atlas 2 graphical layout. In Fig. 2a, we can quickly identify the line that best divides the two echo chambers. Accordingly, we generate two subgraphs, one with the nodes and links of the section below the line (subgraph R), and one with those above it (subgraph B). The number of links that crosses the two subgraphs is 1889, just 5.22% of all links in the original network. Subgraph B turns out to be the largest (8352 nodes and 17,205 links vs. 6514 nodes and 17,088 links) although sparser (0.0004 vs. 0.0008) than subgraph R, which also shows higher values for the average degree (4.11 vs. 5.24) and transitivity value (0.007 vs. 0.012). Subgraph R hosts mostly users opposed to kneeling (in red, 68% of all its nodes), with an average opinion of 1.29. On the contrary, in subgraph B, 75% of the nodes have a favorable opinion regarding the issue discussed. Notice that subgraph R hosts a higher percentage of users with the opposite opinions (12% vs. 9.66%) and neutral or unclassifiable ones (19.65% vs. 14.9%) than subgraph B, which is more homogeneous. The bar graph in Fig. 2b shows another difference between the two ECs: subgraph B contains most of the hubs of the entire network (21 out of 31), however, hubs in subgraph R have a much higher average degree (646 vs. 434). Therefore, hubs in both subgraphs have been the linchpin of the debate. This is supported by the fact that their entrance into the network, between \(G_3\) and \(G_4\), correspond to the growth peak of the network, as shown in Table 2. More specifically, during this short time window, almost 80% of the entirety of the nodes join the network, of which about 60% through direct links with the hubs (line graph in Fig. 2b). Among the causes of the growth in \(G_3\) we find Enrico Letta (@EnricoLetta) former leader of the Democratic Party ended up at the center of the controversy for the pro-kneeling statements released on television on June 21.Footnote 10 Looking at the biggest peak in \(G_4\) we find Federico Rampini (@FedericoRampini) journalist and essayist, who in an interview on June 25 expressed his concerns about taking the knee,Footnote 11 Giorgio Chiellini (@chiellini) criticized for his gaffe about Nazism on June 26,Footnote 12 as well as Roberto Saviano (@robertosaviano) famous writerFootnote 13 and Le frasi di Osho (@lefrasidiosho), Twitter account of a popular satirical Facebook page.Footnote 14 All of these public interventions ignited the debate, encouraging the polarization of users’ opinion, hence setting the stage for the ECs.

Fig. 3.
figure 3

Meso-scale ECs. The scatter plots display the Conductance (x-axis) and Purity (y-axis) scores for the biggest detected communities. Circles represent Louvain communities, where red denotes communities populated by most opponents users, blue proponents ones. The red line marks the Purity threshold (0.7). Due to the early stages of the event, communities extracted in \(G_1\) and \(G_2\) are too small to be meaningful, therefore we omitted the related visualization.

Meso-scale. To refine the macro-scale analysis, we then focus on assessing the presence of echo chambers-like meso-scale topologies (e.g., well-separated communities composed of like-minded users)—and on studying if/how and at what moment users happen to gather in that form. To do so, we cluster each temporal network snapshot independently using the Louvain algorithm [4]. Once identified the network communities, we evaluate their echo chamberness using Conductance (i.e., the fraction of total edge volume that points outside the community) and Purity (i.e., the product of the frequencies of the most frequent labels carried by nodes of a community), following the approach introduced in [13]. To such an extent, we label each user as: Pros (proponents) (\(C_{u} \le -\,0.5\)), Cons (opponents) (\(C_{u} \ge 0.5\)), Neutral (\(-\,0.5< C_{u} < 0.5\)). We set the Conductance threshold value at 0.5 to ensure that more than half of the total edges in the community remain within its boundaries. For Purity, we set a threshold equal to 0.7 to make sure that most of the users in a community share the same opinion.

In Fig. 3 we show the communities evaluation process over time. The first thing we notice is the network growth, and therefore the number of the communities along with their size, around the Italy-Wales match, between the second and the fourth time frame. This match (Fig. 3b) is the one in which only five Italian players took the knee, following the Welsh players’ example. The press was finally talking about it, and the discussion on Twitter went on and on. From that moment, the network grows steadily until the Italy-Austria match (Fig. 3d) and remains stable in the subsequent matches. In each scatter plot, we can classify as echo chambers the communities that lie above the dashed red line. The colors used to portray the opinion are blue for the users that support the kneeling act, red for those who are against it, and grey for the neutral ones. In the first and the second time frames, the network is too small to be meaningful from a meso-scale perspective, so, excluding those two, the overall scenario is almost the same: there are lots of communities classifiable as echo chambers, more of each of them for proponents and opponents; on average, 70% of the users are confined in echo chambers; in almost every time-frame the supporters outnumber the opponents, while neutral users appear only at \(t=4\) (\(G_4\)) and they do not exceed the 5%. Moreover, by globally looking longitudinally at the various communities identified, we can observe that the only big-sized ones with a lower score of Conductance are those prevalently composed by opponents. Indeed, as we can see in Fig. 1a, the opponents are more extreme in expressing their thoughts: their opinion being further from 0 than the one of those in favor. Such behavior could mean that users with a highly polarized opinion are more likely to be trapped in well-defined echo chambers.

Fig. 4.
figure 4

Micro-scale ECs. Contour map for average opinion of neighbors \(C_{N(u)}\) against the average opinion of a user \(C_{u}\). The colors represent the density of users: the lighter it is, the greater the number of users. Each plot refers to a temporal network snapshot . \(G_5\) and \(G_6\) are omitted due to their negligible changes w.r.t. \(G_7\). \(\rho \) is the Pearson’s coefficient. r is the Assortative Mixing.

Micro-scale. Finally, to understand if, individually, the observed users are embedded in local echo chambers, we compare their opinions with their neighbors’ one. To such an extent, we assume that user-centric chambers are present if/when nodes tend to be connected primarily to peers sharing a similar opinion. Therefore, to measure such a tendency, we define for every node u the average neighbors’ opinion as \(C_{N(u)}\). Figure 4 shows the correlation between a user’s opinion \(C_{u}\) and the position of his nearest neighbors \(C_{N(u)}\) in each snapshot. Initially, the users spread out over a very large area (Pearson’s coefficient \(\rho \simeq 0.49\)), but as time passes, the tendency of the density to spread out diagonally gradually becomes more pronounced until it grows considerably (\(\rho = 0.72\)) and become statistically significant (\(\rho = 0.85\), \(p-value \simeq 0\)) at the \(G_4\) in Fig. 4d, i.e., between Italy-Wales and Italy-Austria, after which the situation remains almost unchanged. We observe the same trend by looking at networks assortativity (r) that is always greater than 0 and increases around \(G_3\), stabilizing itself from \(G_4\) at \(r = 0.46\), meaning that networks are quite assortative. Such results unveil the presence of micro-scale echo chambers: users who express an opinion in favor/against taking the knee have a higher probability of interacting with peers sharing the same opinion.

5 Conclusions

In this work, we proposed a longitudinal analysis of the emergence of echo chambers, witnessing their evolution from the very moment they are born to their stabilization. The event we explored concerns EURO 2020 and the Italian Twitter debate born around the players taking the knee, or not, in support of the BLM movement.

Our time-aware analysis unfolds across three different topological scales (macro, meso, micro) and highlights a consistent behavior of the system as a whole as well as when its components are taken independently. At a macro-scale analysis, we identified two well-separated ECs. Next, at the meso-scale level, we observed how ECs started appearing around \(G_3\) and strengthened in \(G_4\)—a result also confirmed by our node-level analysis that highlighted how users’ opinions become considerably correlated to their neighbors’ one, starting from \(G_4\), the time window when all the hubs entered the network.

We aim to extend the proposed longitudinal analytical framework to enhance the understanding of ECs formation and, at the same time, to open the discussion on the degree of predictability of EC-like phenomena in online debates.