1 Introduction

Religion, sports and politics are often topics of heated discussions. These debates reveal contrasting ideas and, in this situation, people tend to strengthen their prior beliefs (Sunstein 2002). For instance, those contrary to abortion legalization tend to be more extremely opposed to it after interacting with others that share the same point of view. On the other hand, supporters of the cause reinforce their position after communicating with other pro-legalization individuals, which intensifies the controversy around the topic. In Social Sciences, this simultaneous presence of conflicting tendencies or principles characterizes the process of group polarization (Fiorina and Abrams 2008). Above the many subjects that raise polarization in society, politics has been shown to be one of the most fertile grounds to bring disagreements among people into the open (Adamic and Glance 2005; Farrell and Drezner 2008; Farrell 2012).

The Web, together with online social networks (OSNs), allows political parties and the population to spread their opinions quickly and to a large audience, which may amplify the political polarization process around the world (Adamic and Glance 2005; Farrell 2012; Farrell and Drezner 2008). In particular, OSNs do not serve only as a vehicle to consume information, but also allow people to express their positions and participate on political campaigns. OSNs can also influence people’s opinion about politicians and parties. A survey by IBOPE (2016) revealed that, in Brazil, approximately \(51\%\) of the voters consume political information from OSNs. Among these, \(27\%\) of people stated that they had a more favorable impression of a politician or a party after viewing posts on OSNs. By comparison, \(56\%\) declared that they have changed their opinion for the worse about politicians and parties due to what they read on these platforms.

Since the mass protests that erupted in 2013, Brazil is facing a political crisis. Dilma Rousseff was elected in 2014 with a small margin of votes over the candidate Aécio Neves, who later, along with his party, officially contested the result of the elections. Later, in 2015, there were new revelations regarding politicians involved in the largest corruption scandal in the history of Brazil: “Lava Jato” (Car Wash operation), including many from Dilma’s party. On December 2, 2015, Congress launched the impeachment proceedings against the president. After several mass protests against and in favor of Dilma, she was impeached on April 17, 2016.

Events such as the protests of 2013, the elections of 2014, and the impeachment proceedings of Dilma mobilized people and divided opinions. Ruediger et al. (2014) showed that these disputes have emerged due to the rise of Internet usage over the last few years in the country, especially in OSNs. These platforms are a reflection of the points of view of the individuals and, at the same time, contribute to the intensification of the differences between the opposite opinions and set the stage for heated political debates among users and parties (Ribeiro and Gomes Goveia 2016).

In the field of politics, social scientists describe two types of polarization: elite polarization and mass polarization (Druckman et al. 2013; Baldassarri and Gelman 2008). Elite polarization is characterized by high ideological discrepancies between political parties and a strong similarity of positions within parties (Druckman et al. 2013). Mass polarization, in turn, is related to the segregation of common individuals in a society due to the divergence of opinions regarding actions and ideas of political elites (Baldassarri and Gelman 2008). With respect to this process of social division, the American society and political system are one of the most widely studied examples (Fiorina and Abrams 2008; Abramowitz and Saunders 2008; DiMaggio et al. 1996).

The main goal of this work is to develop computational methods to analyze online and offline data to study political polarization. Our focus is to investigate the perceived increase in polarization that happened in Brazil regarding the impeachment proceedings of 2016. We propose a method to quantify the elite polarization among the Brazilian politicians in Congress (offline analysis) and the mass polarization among the Brazilian people on Twitter (online analysis).

1.1 Contributions

Our work builds on the Computational Social Science body of research by proposing methods to analyze online and offline data in order to study the two aforementioned types of political polarization: elite and mass polarization. Accomplishing this goal involves meeting two main technical challenges: detecting opinions in two different datasets and measuring polarization in them. The theoretical part of this study consisted of identifying and understanding the sociological and linguistic aspects that characterize the different opinions shared on Twitter concerning the impeachment, as well as the behaviors that define polarization in Congress. The methodological part of our research involved developing approaches to deal with the computational challenges involved in extracting information from massive amounts of data, particularly when it comes to short and informal texts from social media.

2 Related work

2.1 Polarity evaluation

The vast majority of the literature in polarity evaluation deals with the problem of classifying the position of an opinionated piece of text, a task referred to as sentiment polarity classification (Missen et al. 2013; Pang and Lee 2008). For this task, some works use manually created sentiment resources or even create a set of manually annotated words or posts for detecting sentiment in a dataset (Wilson et al. 2005; O’Connor et al. 2010; Mohammad et al. 2017). Others explore some characteristics of informal texts, such as hashtags and emoticons, to automatically find the polarity of the posts (Davidov et al. 2010; Kouloumpis et al. 2011; Mohammad 2012). The latter approach avoids the high cost of doing a manual annotation of the data and, for this reason, has been applied to analyze short informal texts (Kiritchenko et al. 2014). Some of these studies select hashtags that clearly show a sentiment—e.g., #happy or #sad—and use them as labels of positive and negative sentiment to build a training dataset of tweets (Davidov et al. 2010; Kouloumpis et al. 2011). The labeled data is used to train sentiment classifiers to find the polarity of tweets that do not contain the selected hashtags.

Our study intends to measure the polarity of OSN users rather than solely identifying the opinion conveyed by the documents in a dataset. With the purpose of understanding the opinion dynamics of individuals in OSNs, some works deal with the problem of modeling how users change their opinion in face of their neighbors’ opinion and how their opinions evolve over time by using opinion formation models (Das et al. 2014; Morales et al. 2015; Jiang and Wu 2017). Some works use collective classification to identify the opinions of users in social media by classifying the opinion of unlabeled users based on their links to the labeled ones (Li et al. 2016; Ileri and Karagoz 2016).

With the purpose of finding the opinion of users, the work of Rabelo et al. (2012) investigates people’s opinion about US politics by building a directed graph, where the nodes are the OSN users and the edges model the follower–followee relationship. They select a group of hashtags that clearly convey an opinion and classify posts according to the sentiment associated with these hashtags. In order to find a set of users with a known opinion, users are labeled with the opinion that has the highest count among their posts. The collective classification is then performed on users who have no posts with the considered hashtags. In our study, we follow a similar idea to Rabelo et al. (2012) to calculate the polarity of users in OSNs. Unlike them, our model uses retweet networks, since the retweets act mostly like endorsements of an opinion (Metaxas et al. 2015), which seems to be a more meaningful basis for computing the user's polarity.

2.2 Polarization measurement

Although there are many works in the field of opinion polarization, there is no consensus about a quantitative measure for it (Schmitt 2016). According to Bramson et al. (2016), most of the studies present a formal measure for polarization that is specific to the dataset or topic of interest (e.g., politics), which explains the diversity of polarization measures in the literature. Garimella et al. (2018), for instance, propose the Random Walk Controversy (RWC). Given two partitions A and B, this measure quantifies the likelihood of an individual, which is in a partition A, to be exposed to authoritative content from the opposite partition (partition B). Another measure to compute the polarization of a set of individuals is the modularity of a graph (Conover et al. 2011; Guerra et al. 2013). Proposed by Newman (2006), the modularity quantifies the level of division of the network into groups, i.e., a high modularity value indicates that the network has dense connections between the nodes within the groups, but few connections between nodes from different groups.

Based on the electric dipole moment, the polarization index proposed by Morales et al. (2015) takes into account the probability density distribution of the opinions of individuals to quantify the segregation within a population. This measure considers that group polarization depends on the difference between the size of the opposite groups, as well as the distance between their central points of view. Even though the authors use a network to estimate the polarity of individuals in their study, they do not need a network structure to calculate the final polarization, which is solely based on the density distribution of opinions. For this reason, we used the polarization index proposed by Morales et al. (2015) to quantify polarization in our study, which also overcomes the restrictions to calculate polarization in our offline dataset.

2.3 Temporal topic evolution

Since the Web has become one of the main vehicles to consume, create and share information, it is important to detect the central subjects that are discussed in the virtual world. This can be done by topic modeling algorithms, which are statistical methods designed to discover topics in a collection of documents. One of the most widely used algorithms for this purpose is the Latent Dirichlet Allocation (LDA) (Blei et al. 2003), which finds topics in large text collections.

The conventional topic modeling algorithms, however, do not contemplate the data sparsity problem of short texts, such as posts of online social networks. Thus, in the context of social media, it is necessary to use algorithms that deal with the short and informal nature of OSN posts, such as the Biterm Topic Model (BTM) (Yan et al. 2013), which directly models the generation of pairs of co-occurring words (biterms) in a corpus of short texts. It considers the whole collection of documents as a single one, modeling the corpus as a mixture of topics. This algorithm draws a topic distribution \(\theta\) for the entire collection and, for each topic z, it draws a topic-specific word distribution \(\phi\), according to the probability of the co-occurring words in the corpus.

Besides dealing with the large volume of texts, as data is produced at a high speed, it is also relevant to consider that topics may change over time. With that in mind, some works in the literature used document clustering to find topic variations over time (Mei and Zhai 2005; Stilo and Velardi 2016; Zhang et al. 2015); while others modified the traditional probabilistic topic models adding time as a factor of the algorithm so as to model topic evolution (Blei and Lafferty 2006).

As one of the studies that adopts clustering techniques, Mei and Zhai (2005) applied a general probabilistic model to discover theme patterns and generated a graph to find word clusters for each time period. This graph—the evolutionary theme graph—is used to determine how topics change over time and how previous topics influence later ones. With the purpose to analyze the tweets of the Brazilian politicians in our study, we followed a similar idea to Mei and Zhai (2005) for comparing topics in consecutive time intervals by building an evolution graph. Next, to find the variations in the topics discussed by the politicians over time, we used BTM to discover topics on the short Twitter messages and built a topic similarity graph to track variations on them across the studied period.

2.4 Computational social science and politics

One of the most studied applications of Computational Social Science is politics, since the Web and the social platforms are valuable tools to analyze the views and beliefs of a population and predict outcomes regarding political events. There are several examples of works of this nature, going from studies regarding the behavior of the politicians (Joseph et al. 2019; Lietz et al. 2014; Livne et al. 2011) to the analysis of the opinions of the general population (Badawy et al. 2019; Davidson et al. 2020; de França et al. 2018; Garimella et al. 2018; Morales et al. 2015; Conover et al. 2011; Makazhanov et al. 2014). In addition, there are studies that analyze both elite and mass behavior in online social media, like the one from Boutet et al. (2013), which investigated the most relevant features of the political parties in OSNs and used this information to build a classifier to find the political leaning of regular Twitter users.

Focusing on the interactions of political elites on social media, Lietz et al. (2014) proposed a set of quantitative measures to study the sociocultural structure and dynamics of the online conversational practices of political parties on Twitter over time. Following the same line, Livne et al. (2011) used graph and text mining techniques to study the behavior of the candidates of the most popular parties in the USA during the 2010 American elections.

Some other studies address the political opinions and reactions of the common citizens, such as the work of Morales et al. (2015), who investigated the emergence of polarization toward the ex-Venezuelan president Hugo Chávez on Twitter. In their study, they analyzed a set of retweets and showed that a small set of influential users was able to spread their opinions through OSNs, which generated an impact in the segregation of opinions of the population. Conover et al. (2011) also studied political polarization on Twitter by analyzing retweet and mention networks of tweets posted during the 2010 presidential elections. Also exploring the properties of the retweet networks, Badawy et al. (2019) analyzed the influence of social bots over public opinion regarding the 2016 American presidential elections. In the study, they showed that these malicious actors are able to polarize the political conversation on Twitter.

Our work builds on this body of research by proposing computational methods to analyze online and offline data in order to study mass and elite polarization. Besides developing a computational approach to calculate polarization and polarity, we also intend to contribute to sociological studies about the impeachment of 2016 in Brazil, which was investigated through different angles by other authors (de França et al. 2018; Carvalho et al. 2016).

de França et al. (2018), for example, analyzed common Brazilian citizens on Twitter in order to understand the main profiles that exist on the social network and how the opinions are propagated by them. Carvalho et al. (2016), on the other hand, used topic modeling algorithms to find out the demands and points of view of the pro and anti-impeachment protesters on OSNs. In this paper, we bring a new perspective to the analysis of this event by investigating real-world and social media data from politicians and the general public regarding the Brazilian political scenario and the impeachment itself, for which we also investigate possible associations between online aspects and offline actions.

3 Polarization index

As previously mentioned, there are many ways to measure polarization. We chose to use the polarization index proposed by Morales et al. (2015), as it does not require a network structure to compute polarization as many proposed methods do. This polarization index considers that a population is perfectly polarized when it is divided into two groups of the same size that share opposite views about a subject. Figure 1 shows the variables involved in the computation of the polarization index of a population. This metric takes into account the size of the populations of opposite opinions (\(A^-\) and \(A^+\)), the gravity centers of each population (\(\text {gc}^-\) and \(\text {gc}^+\)) and the distance between these gravity centers (d).

Fig. 1
figure 1

The variables involved in the computation of the polarization index of a population: the size of the populations of opposite opinions (\(A^-\) and \(A^+\)), the gravity centers of each population (\(\text {gc}^-\) and \(\text {gc}^+\)) and the distance between these gravity centers (d)

Given that polarity X is measured for each individual in the range \([-1,1]\), it is possible to calculate the size of such populations by taking into account the probability density functions (PDFs) of polarities p(X) for the set of studied individuals. Thus, the population of negative opinions (\(X < 0\)), represented by \(A^-\), is computed by integrating the distribution p(X) over the interval \([-1, 0]\) (Eq. 1). Comparatively, the population of positive opinions (\(X > 0\)), designated by \(A^+\), is calculated by integrating the polarity distribution over the interval [0, 1], as shown in Eq. 2.

$$\begin{aligned} A^-= & {} \int _{-1} ^0 p(X)\;{\mathrm{d}}X = P(X < 0) \end{aligned}$$
(1)
$$\begin{aligned} A^+= & {} \int _0 ^1 p(X)\;{\mathrm{d}}X = P(X > 0). \end{aligned}$$
(2)

Equation 3, in turn, determines the normalized difference between population sizes \(\Delta A\), which is one of the central variables to compute the final polarization index. This difference represents how unbalanced are the existing groups, i.e., it shows if one population has a greater density of individuals than the other.

$$\begin{aligned} \Delta A = | A^+ - A^- |. \end{aligned}$$
(3)

Note that the population sizes \(A^-\) and \(A^+\) are calculated as the area under the PDF over the polarities which represent negative and positive opinions, respectively. Hence, their values reveal the probability of an individual to be part of that population. As probabilities, \(A^-\) and \(A^+\) values lie in the range [0, 1] and, as a result, their normalized difference \(\Delta A\) is also restricted to the range between 0 and 1. The closer \(\Delta A\) is to 0, the more similar the population sizes. Conversely, a \(\Delta A\) close to 1 indicates that the probability distribution takes the shape of a unimodal distribution, having one population with a much greater density than the other.

Another key variable is the distance d between the positive and negative opinions, which quantifies the level of divergence between the opposite populations. It takes into account the gravity centers of negative \(\text {gc}^-\) (Eq. 4) and positive opinions \(\text {gc}^+\) (Eq. 5), which measure the central opinion of the positive and negative populations.

$$\begin{aligned} \text {gc}^-= & {} \frac{\int _{-1} ^0 p(X) X\;{\mathrm{d}}X}{\int _{-1} ^0 p(X)\;{\mathrm{d}}X} \end{aligned}$$
(4)
$$\begin{aligned} \text {gc}^+= & {} \frac{\int _0 ^1 p(X) X\;{\mathrm{d}}X}{\int _0 ^1 p(X)\;{\mathrm{d}}X}. \end{aligned}$$
(5)

The distance d is then computed as the normalized difference between these gravity centers, as shown in Eq. 6:

$$\begin{aligned} d = \frac{| \text {gc}^+ - \text {gc}^- |}{|X_\text {{max}} - X_\text {{min}}|} = \frac{| \text {gc}^+ - \text {gc}^- |}{2} \end{aligned}$$
(6)

where \(X_\text {{max}}\) represents the upper limit of the opinion values of the positive population (i.e., \(X_\text {{max}} = 1\)) and \(X_\text {{min}}\) represents the lower limit of the opinion values of the negative population (i.e., \(X_\text {{min}} = -1\)).

Note that \(d = 0\) indicates that the individuals share the same opinion since there is no difference between the central opinions of the opposite populations. On the other hand, a distance d close to 1 reveals that the two main opinions are in the extremes of each side.

Finally, Eq. 7 shows how to compute the polarization index \(\mu\). As previously explained, polarization increases with the separation of the opposite groups, which is why the index \(\mu\) is proportional to the distance d between gravity centers. Also, polarization is affected by the density of the populations, reaching its maximum value when the groups have equal sizes. Alternatively, the greater the difference between these groups, the smaller the index \(\mu\).

$$\begin{aligned} \mu = (1 - \Delta A) \; d \end{aligned}$$
(7)

The polarization index \(\mu\) lies in the range [0, 1] and its resulting values can be interpreted as follows. When \(\mu\) reaches its maximum value (\(\mu = 1\)), we can say that the population is perfectly polarized. In this case, the populations have equal sizes and their polarities are centered in the extreme values (\(-1\) and 1). When \(\mu\) has its minimum value (\(\mu = 0\)), the population is not polarized. This is the case that the probability distribution of polarities takes the shape of a unimodal distribution, having the difference between populations sizes \(\Delta A = 1\). In this situation, either the population is centered at a neutral opinion or it is entirely centered in one of the extremes.

4 Elite polarization

4.1 Data

We collected roll-call voting data from Brazil’s House of Representatives for the years of 2015 and 2016Footnote 1. The dataset contains all proposed bills, parties orientation and votes from each representative of the House over the studied period.

We only took into account voting events where Congress leaders of the Workers Party (PT) and the Brazilian Social Democracy Party (PSDB) gave divergent orientations to their representatives. Since impeached president Dilma Rousseff is a member of PT, Workers Party is a key point in our study. As its major opposition party, PSDB is also relevant to understand how other members of the House changed their support to the ideas of each of these opposite parties along time. The rationale behind this methodology is that, if we include voting events about common-interest subjects, it would not be possible to see the existing ideological differences between the main parties and how the whole set of politicians behave around these contrasting ideas.

The final dataset includes 225 voting events that took place between March 10 2015 and December 14 2016. In addition, it only comprises the votes of members that have voted “yes,” “no” or “abstention” and have participated of sessions in at least \(80\%\) of the studied period (i.e., 16 months), which includes 471 representatives (about 91.8% of the total members of the House).

4.2 Methods

In order to understand the temporal evolution of polarization, we split our dataset of voting events into time slices, adopting month as the time unit. Figure 2 shows an overview of the steps of our methodology. First, we measure the polarity of each representative included in each of the time-sliced sets (Step 1) and, after that, we compute the probability density function (PDF) for these polarity values of each month (Step 2). Finally, the polarity values and their derived PDFs are used to calculate the polarization index across the entire period (Step 3).

Fig. 2
figure 2

Overview of the steps to calculate politicians polarization

4.3 Polarity calculation

As a general concept, we define the polarity of an individual as her position around a subject, which she expresses by either agreeing (“yes”), disagreeing (“no”) or being neutral toward it. In this study, polarity is defined as the extension to which a politician agrees with the target party orientation. To compute its value, we took into consideration the votes of each representative in the following situations: (i) votes that agreed with the target party position, \(v_\text {{+}}\); (ii) votes that were contrary to the target party orientation, \(v_\text {{-}}\); (iii) abstention votes, \(v_\text {{0}}\). Workers Party (PT) is taken as the target group due to the fact that it was the ruling party during the period covered by the dataset, to which belongs the impeached president Dilma Rousseff.

Let \(v_i^T\) be the total number of votes given by an individual i and \(v_i^-\), \(v_i^+\) and \(v_i^0\) be her number of votes pro-PT, anti-PT and abstentions, respectively. We compute the proportion of her votes toward a given position by dividing her number of votes of that position by her total number of votes, i.e., \(r_i^+ = v_i^+/v_i^T\) and \(r_i^- = v_i^-/v_i^T\). Then, the polarity \(p_i\) of a representative is simply given by \(p_i = r_i^+ - r_i^-\), which lies in the range \([-1,1]\) and it represents the level of inclination of an individual toward the target party positioning. Here, it is important to point out that we decided to reverse the signal of the obtained polarity values \(p_i\), i.e., the closer the polarity measure is to \(-1.0\), the more similar are the opinions of the representative to PT; on the other hand, the closer to \(+1.0\), the more divergent are his opinions to the target party. We chose to associate a negative polarity as the reference of PT positioning due to its left-wing orientation, allowing us to generate more intuitive visualizations.

4.4 Experimental results

In order to compute the overall polarization, we first use the individual polarity values \(p_i\) to compute the probability density functions (PDFs) for each time slice. These PDFs translate the distribution of opinions from the whole set of representatives over time, as shown in Fig. 3. Some months are not shown either because there were no voting session for that month (House recess) or due to the restrictions of our study, which only takes into account voting events where PT and PSDB disagree.

Fig. 3
figure 3

PDFs for the polarity values X of the politicians per month

First, observe that the polarity values are more evenly distributed until November/2015, when the representatives seem to start concentrating in opposite groups. In December of the same year, the density has considerably increased on the left side (PT supporters), which is almost 3 times greater than the right one (PSDB supporters). Then, in February and March of 2016, the two largest groups that have about the same density are situated in opposite polarities. From April/2016 onward, the right-wing group (anti-PT) attracted more representatives, showing a greater density than the pro-PT side. The only exceptions to this situation are May and November of 2016. In May, apart from the opposite groups, there is also a well delimited central group. It possibly indicates that, in this month, some representatives agreed with PT in about half of the voting sessions and were contrary to the party in the other half, which resulted in a polarity value close to 0. In November, the polarity values are distributed over the whole interval \([-1,1]\), which suggests that most of the representatives did not specifically express support to anti- or pro-PT ideas during this month.

In short, groups of divergent polarities can be observed over almost the entire studied period, but their differences in the number of representatives are more noticeable after November of 2015. Considering the whole period, observe that polarities are more evenly distributed during 2015, whereas there are more noticeable groups of opposite polarities during 2016. This change in the behavior of the politicians coincides with the launch of the impeachment proceedings in early December 2015.

In 2016, after the proceedings started, the density distributions show two well-defined groups for most of the months, suggesting that the launch of the impeachment was an important issue that divided the representatives. In April of 2016, the right-wing group (anti-PT) has a higher density, which also happens in the following months. This situation can be explained by the voting of the impeachment at the House of Representatives in April of 2016, when 367 out of 513 representatives voted to remove Dilma Rousseff from the office, i.e., most of the politicians had a different opinion from her party (PT) position. The similar anti-PT behavior in the subsequent months could be related not only to the impeachment proceedings, but also to the weakening of the Workers Party per se, in the view of the corruption scandals and the protests from the population, which also may have affected the political coalitions among the politicians.

The overall polarization is finally computed using the PDFs and its results are shown in Fig. 8. Before November of 2015, there were no polarization index values higher than 0.5, whereas \(\mu\) was close to 0.6 for most of the months of 2016. Then, in December of 2015, the polarization index reached its highest value for the whole studied period. In the same month, the distance between gravity centers also reached its highest value, which suggests that the increase in polarization is more related to variations in this factor at this point. In 2016, the lowest polarization index values occurred in May and November. For both months, there was a decrease in the distance between gravity centers, i.e., the average polarity of each group moved away from the negative or positive extremes. Despite the fact that there are variations to \(\Delta A\) over the entire period, the shape of the polarization index resembles the fluctuations to the distance d. Only in April of 2015, when \(\Delta A\) reached its peak, we can see that an increase in this variable causes a major decrease in the polarization index. However, for the most part of the period, variations to \(\Delta A\) have a minor or little influence over polarization when compared to the distance between gravity centers.

In summary, the polarization index recorded higher values in 2016 as compared to the previous year. In addition, major changes in its value were mostly related to variations in the distance between gravity centers. This situation indicates that, in our case study, the polarization among Brazilian representatives was more affected by the average polarity of each group rather than the volume of politicians inside them.

As for the density distributions, the polarization for the politicians increased after December of 2015, which coincides with the launch of the impeachment proceedings in the start of that month. The polarization values were higher in 2016 than in the previous year, which corroborates our previous observations about the facts around this phenomenon: the representatives became more polarized after the launch of the impeachment proceedings in the end of 2015 and this scenario persisted for the most part of 2016.

In addition, the fact that the polarization among the politicians was more influenced by the distance between the gravity centers indicates that the impeachment and its related events may have resulted in changes in political alliances among the representatives. To put it differently, politicians that had more ideas in common with PT—in the left part of the political spectrum—may have gotten even closer to the party after the events, making similar decisions at the Lower House. On the other hand, politicians with a different ideology—most of which were in the right part of the political spectrum—may have adopted a more opposed attitude, making more opposite decisions from PT at the House.

5 Temporal topic evolution of politicians agenda in online social media

Besides studying the elite polarization process at the Lower House, we have also investigated the behavior of the Brazilian politicians (elite) in online social media. For that, we collected a dataset of tweets from the Brazilian representatives of the Lower House of Congress and then detected the main topics they have discussed over the studied period.

5.1 Data

We collected the tweets from Brazilian representatives that were part of the Lower House of Congress (House of Representatives) and were active on Twitter from January of 2015 to November of 2016. The dataset includes 502,342 tweets from 423 representatives (about 82.5% of the total number of House members) shared over the aforementioned period. As a preprocessing step, all tweets were lower-cased and stop words were eliminated. All messages in our Twitter dataset are in Portuguese, but our results were translated to English for the sake of understanding.

5.2 Method

With the purpose of finding the central subjects discussed by the Brazilian politicians on Twitter over time, we followed a similar idea to Mei and Zhai (2005) for comparing topics by building an evolution graph. The difference in our approach is that our graph was built by comparing topics from all the time slices, instead of only comparing topics from consecutive time intervals. In addition, we take into consideration that sets of similar topics may be related to others, for which we successively group the sets according to a similarity threshold. To deal with the short nature of the Twitter messages, we use the Biterm Topic Model (BTM) (Yan et al. 2013) to discover topics from fixed time slices. From the BTM results, we build a topic similarity graph (evolution graph) to track variations on them across the studied period.

Our temporal topic evolution approach was conducted as follows. The Twitter messages M of our dataset were organized according to a time slice of interest (e.g., hour, day, month, etc.) \(M_P = \{M_1, M_2, \ldots , M_p\}\), where p is the number of time slices in the studied period P. In this study, we broke the period into monthly slices. We then used BTM to find the topics for every set \(M_i\). This algorithm receives as a parameter the number k of topics and a time slice i, and produces as output a list of topics \(T_i = \{ T_i ^1, T_i ^2, \ldots , T_i ^k \}\), where each \(T_i ^j\) is represented by its topic number j (i.e., the number that identifies a topic) and is described by w words. After executing BTM for every time slice, we ended up with a set of topics \(T_P = \{T_1, T_2, \ldots , T_p\}\) for the entire period. Having this set of topics \(T_P\) per time slice, we follow the process illustrated in Fig. 4. From \(T_P\), we created a unique topic similarity graph \(G_T=\{V_T,E_T\}\) to find groups of similar topics, as described by the Algorithm 1. In this graph, the vertices \(V_T\) represents the topics in \(T_P\) and the edges in \(E_T\) measure the similarity between each pair of topics. The similarity was calculated using the Jaccard coefficient \(\sigma\), which computes the proportion of shared words w between each pair of topics \(T_{i_1} ^{j_1}\) and \(T_{i_2} ^{j_2}\):

$$\begin{aligned} \sigma (T_{i_1} ^{j_1}, T_{i_2} ^{j_2}) = \frac{|w_{T_{i_1} ^{j_1}} \cap w_{T_{i_2} ^{j_2}}|}{|w_{T_{i_1} ^{j_1}} \cup w_{T_{i_2} ^{j_2}}|} \end{aligned}$$
(8)
Fig. 4
figure 4

Proposed topic evolution approach. In this example, we consider three monthly slices, for which we have the sets of topics \(T_P = \{ T_{Jan}, T_{Feb}, T_{Mar} \}\). Each of these sets has two topics each, resulting in a graph with 6 nodes. Note that each node corresponds to a topic \(T_{i} ^{j}\), where j is the topic number and i is its associated monthly slice

figure a

From the topic similarity graph \(G_T\), we removed from \(E_T\) the edges that indicate a similarity lower than a threshold \(\tau _\sigma\) (in Fig. 4, for example, \(\tau _\sigma = 0.3\)). After this step, we ended up with a set of m connected components \(C_{T} =\{C_1, C_2, \ldots , C_m\}\), which we assume to capture an intrinsic similarity between the topics related to the event of interest. In this way, each \(C_x\) is considered as a group of similar topics, and the number of topics is reduced from the original \(k \times p\) to m. We then merged all the nodes (topics) within \(C_x\) into a single super-node and, consequently, a super-topic, which is now described by the union of the words of each topic belonging to \(C_x\). Note that \(C_x\) may have components from different time slices, and that is our goal: to say that the super-topic representing \(C_x\) appears in different time slices. For instance, Fig. 4 shows that \(C_1\) contains the topic number 1 of January (\(T_\text {Jan} ^1\)) and the topic number 1 from March (\(T_\text {Mar} ^1\)), meaning that this super-topic is discussed in January and March.

Our original graph \(G_T\) has now a set of super-nodes \(N_G=\{C_1,C_2,\ldots ,C_m\}\). However, after each connected component becomes a topic, these new topics may again share a large number of words w with others, and hence could be merged again. As a result, we refine these super-nodes representing topics by successively merging them. This is done by calculating, for each pair of nodes \((C_i,C_j)\), their percentage of shared words \(\text {Sw}\):

$$\begin{aligned} \text {Sw}(C_i, C_j) = \frac{|w_{C_i} \cap w_{C_j}|}{\text {min}(|w_{C_i}|,|w_{C_j}|)}. \end{aligned}$$
(9)

The condition to merge the vertices is the following: if \(\text {Sw}\) is larger than a threshold \(\tau _\text {Sw}\) (in Fig. 4, e.g., \(\tau _\text {Sw} = 0.6\)), the pair of vertices \((C_i,C_j)\) is grouped. This grouping process continues until there is no pair of vertices that meets the condition. Algorithm 2 shows this successive grouping step.

figure b

At the end of the grouping process, we have a smaller number of n super-topics \(T'_P = \{T'_1, T'_2, \ldots , T'_n\}\), each one with its associated time slices, which allow us to follow the evolution of these topics over time. Since each \(T'_x\) is composed by a group of connected components, it is also represented by the original topics that are included in these components and their words. As shown in Fig. 4, the initial 6 original topics (\(T_\text {Jan} ^1, T_\text {Jan} ^2, T_\text {Feb} ^1, T_\text {Feb} ^2, T_\text {Mar} ^1, T_\text {Mar} ^2\)) were grouped into 3 final super-topics (\(T'_1, T'_2, T'_3\)). In this example, \(T'_2\) contains the topics \(T_\text {Jan} ^2, T_\text {Feb} ^2\) and \(T_\text {Mar} ^2\), meaning that this super-topic is discussed in January, February and March.

After generating each \(T'_P\), we quantified the relevance of these final topics, which is based on topic probabilities and are calculated by the topic modeling method (BTM). Recall that BTM produces as output the probability of each topic and the probability of a word given a topic. We also use BTM to find the proportion of Twitter messages for each original topic \(T_{i} ^{j}\), since this algorithm already assigns the most probable topic to each message in the dataset. The topic relevance \(\text {TR}_x ^i\) measures the popularity of a final super-topic \(T'_x\) in a time slice i. It is calculated as the sum of the total of messages M assigned to each original topic \(T_{i} ^{j}\) that belongs to the super-topic \(T'_x\), as shown in Eq.  10.

$$\begin{aligned} \text {TR}_x ^i = \sum \limits _{T_{i} ^{j} \; \in \; {T'_x}} |M_{T_{i} ^{j}}|. \end{aligned}$$
(10)

5.3 Experimental results

Following our approach for temporal topic evolution, we were able to find the topics that were discussed over the studied period and how long each topic lasted. To that end, we considered a month as a time unit and we set the value of the parameter k (number of topics) of BTM as 10, obtaining a total of 230 topics over the 23 months of analysis (\(p=23\)). Each of the topics was defined by the 10 most probable words returned by BTM. It is also important to say that the thresholds \(\tau _\sigma\) and \(\tau _{Sw}\) were defined by a qualitative analysis of the intermediary results, due to restrictions of measuring the topic coherence for Portuguese language.

Starting from the 230 initial topics, 50 super-topics were obtained after the aggregation process (\(n = 50\)) and their relevance was calculated for each month. Results are summarized in the heat map of Fig. 5, which is filtered to show the 5 most relevant aggregated topics for simplification.Footnote 2 Remember that topic relevance \(\text {TR}_i ^x\) is computed by the number of tweets posted in a month, and its value is color-coded using a log scale in the heat map. To understand the content of these topics, the top 10 words for each final topic are also presented in Table 1.

Fig. 5
figure 5

Relevance of super-topics \(T'_P\) over the months. The boxes in the figure show the main events that happened in the month regarding the politics

Table 1 Words describing the super-topics for politicians

According to the top words of each final topic, we can make the following observations:

  • \(T_1^{\prime }\) contains words that are closely related to Brazilian political crisis (“impeachment,” “coup,” “against”).

  • \(T_2^{\prime }\) seems to comprise posts about the activities of the politicians at the Lower House, since it includes words such as “committee,” “meeting” and “bill.”

  • Some top words (“congratulations,” “god,” “friends,” “good”) suggest that \(T_4^{\prime }\) contains interactive tweets, which may be used by the representatives to get in touch with their friends and/or public.

  • Words such as “facebook,” “photos” and “posted” show that \(T_5^{\prime }\) may cover tweets regarding the participation of the politicians on other social networks.

  • \(T_6 '\) may contain tweets about the participation of politicians on the news media, due to the presence of words such as “TV,” “talk” and “show.”

As shown in the heat map of Fig. 5, the super-topics \(T_1^{\prime }\) (political crisis) and \(T_4^{\prime }\) (interactive posts) were discussed all over the period covered by the dataset. \(T_1^{\prime }\) was more intensely explored than the other topics, especially during December of 2015 and the periods that go from March until June of 2016 and from August to September of 2016. It is also important to notice that tweets about \(T_2^{\prime }\) (activities at the Lower House) are posted almost over all the time intervals covered by the dataset, except for three months (February, April and September of 2016). For the latter case, notice that April and September of 2016, when topic \(T_2^{\prime }\) was not discussed, coincide with the months in which there was the voting of the impeachment at the Lower House (April of 2016) and at the Senate, with the permanent removal of Dilma from the office (last day of August of 2016, start of September). At the same time, observe that the topic \(T_1^{\prime }\) (political crisis) was more deeply discussed during these months, as denoted by the more intense colors in the heatmap. This may be the reason why the politicians did not post about their activities, giving more emphasis to the discussion of issues about the political crisis: either they were participating in the voting sessions of the impeachment proceedings at the Lower House (April of 2016) or they more concerned about the final voting of the impeachment at the Senate (last day of August of 2016).

6 Mass polarization

Together with the elite polarization, or the polarization of the politicians, our study also investigates the mass polarization, or the polarization among the general public in Brazil. Our goal is to understand the opinions of the common citizens and how segregated they were around these ideas. Hence, similarly to the politicians, we proposed a method to compute the individual polarities and the overall polarization of the Brazilians during the year of 2016, when the impeachment of Dilma Rousseff was voted.

6.1 Data

Since we do not have a direct way to measure the opinion of the general public, we used social media data to do that. We collected a dataset of tweets through the public Twitter Stream API using the 33 keywords shown in Table 2 in the period that goes from March 2016 to December 2016. This set of terms was chosen because they were strongly related to the Brazilian political crisis that was in course during this period. It comprises names of politicians (e.g., “dilma,” “temer,” “lula” and “cunha”), corruption scandals (e.g., “lava jato”), companies that were cited for corruption (e.g., “odebrecht,” “petrobras,” “andrade gutierrez”), among other people and institutions that were involved in that turbulent political scenario. The dataset includes approximately 3.3 million users and about 80.4 million tweets that were posted between March 09, 2016, and December 27, 2016.

Table 2 Keywords used to collect tweets from the general public

As a preprocessing step, all tweets were lower-cased and stop words were eliminated. In addition, social bots were removed by finding users who posted a extremely large number of messages. We identified that users with more than 700K tweets (\(>2K\) tweets per day) had aspects of spammers, and these users were removed from the dataset. It is also important to point out that, despite the fact that our Twitter dataset is in Portuguese, our results were translated to English for the sake of understanding.

6.2 Methods

The evaluation of polarization among the general public (mass polarization) works in a similar fashion of the politicians (elite polarization) study, having two basic parts: i) the measurement of individuals’ polarity and ii) the computation of the polarization index. The whole process is shown in Fig. 6.

Fig. 6
figure 6

Overview of the steps to calculate the polarities of the general public

Again, we split the dataset into monthly time slices (Step 1). Next, we labeled a sample of users who posted a list of hashtags that clearly stands for one of the poles (anti-PT or pro-PT) (Step 2). Having a set of labeled users, we built a retweet network (Step 3), which contains only edges of which one endpoint is an unlabeled user. Based on the connections between unlabeled and labeled users, we calculate the polarity of each unlabeled user (Step 3). This was the same approach used by Conover et al. (2011) and by Boutet et al. (2013) to infer the political leaning of Twitter users. Having these values, the second part of the methodology involves calculating the PDFs from the polarity measures. In the end, we calculate the polarization index from the PDFs.

As the starting point of our approach, we measure the polarity of each user by finding groups of users with opposite opinions in every time-sliced dataset. Bear in mind that our initial concept of polarity is here instantiated as a position regarding the impeachment of Dilma Rousseff, i.e., polarity quantifies how much a user supports the resignation of the president, or how much she is anti-PT or pro-PT.

With the purpose of investigating the different points of view, we began by finding hashtags that clearly show an opinion about the impeachment. Hence, the dataset was first characterized so as to find the most popular hashtags related to the studied subject. Once listed, these hashtags were divided into two groups of opposite opinions using background knowledge from the event. These hashtags are shown in Table 3.

Table 3 Hashtags related to the impeachment event in the general public dataset

Given a set of users U in the dataset, we first select from U the users who posted any of the listed hashtags. We then count the number of messages posted by user u containing hashtags pro-impeachment (\(|M_u^+|\)) and anti-impeachment (\(|M_u^-|\)), and used a conservative approach to label u. We assign pro-impeachment (or anti-PT) (\(+\)) labels to users with \(|M_u^+| > 0\) and \(|M_u^-| = 0\), and anti-impeachment (or pro-PT) (−) labels to users with \(|M_u^-| > 0\) and \(|M_u^+| = 0\). All other users were removed from the dataset, as their position was not clear from their messages.

At the end of this process, U is divided into two subsets: \(U_\text {{labeled}}\) and \(U_\text {{unlabeled}}\), where the opinion of labeled users is already known. Then, to mitigate any selection bias effect, these labeled users \(U_\text {{labeled}}\) were sampled into two equally sized groups of each position. Each group contains 39,940 users, which gives a total of 79,880 users in the sample (\(\approx 2.3\)% of the users).

In order to calculate the polarity of the non-labeled users \(U_\text {{unlabeled}}\), our method builds a retweet network for each month slice, connecting unlabeled users to labeled users that they have retweeted. This network is represented by a weighted bipartite directed graph \(G_R = \{ V_R, E_R \}\), where vertices \(V_R\) represent users and edges \(E_R\) connect users \(u_i\) and \(u_j\) (\(u_i \rightarrow u_j\)) if \(u_j\) retweeted a post from \(u_i\). Edges are weighted by the total number of retweets. In this network, we only take into account messages from users in \(U_\text {{labeled}}\) that were retweeted by users in \(U_\text {{unlabeled}}\), i.e., the graph \(G_R\) is bipartite and only contains edges \(U_\text {{labeled}} \rightarrow U_\text {{unlabeled}}\). All other edges are ignored and all disconnected vertices are also removed. In the end, the final graphs for each time slice included 674,318 non-labeled users, which comprises 22% of the number of users of the whole dataset.

In \(G_R\), given that an unlabeled user u in \(U_\text {{unlabeled}}\) retweeted messages by a set of n labeled users, this user u is the endpoint of n edges \(E_u = \{ e_1, e_2, \ldots , e_n \}\) in the graph. Each edge \(e_i\) pointing to user u from labeled user v has a weight \(w_i\), which represents the number of retweets that user u made to posts by v. Thus, let \(M_u=\sum \limits _{e_i \; \in \; E_u} w_i\) be the total number of retweets made by u. The edges \(E_u\) of user u can be divided into two groups: a group of edges connecting u to pro-impeachment users (\(E_u ^+\)) and another to anti-impeachment users (\(E_u ^-\)). Thus, the total number of retweets \(M_u^x\) made by user u on messages posted by labeled users of each position \(x \in \{+, -\}\)—pro-impeachment (\(+\)) or anti-impeachment (−)—is given by \(M_u ^x = \sum \nolimits _{e_i \; \in \; E_u ^x} w_i\).

By using these values, we can also compute the proportion of retweets \(r_u ^x\) from unlabeled user u on labeled users of position \(x \in \{+, -\}\) by \(r_u ^x = {M_u ^x}/{M_u}\). Finally, from these values we compute the polarity \(p_u\) of user u in \(U_\text {{unlabeled}}\) as \(p_u = {r_u ^+} - {r_u ^-}\). The polarity is defined as the difference between the proportion of retweets from u on posts made by labeled users of each position. Hence, polarity lies in the range \([-1, 1]\) and represents the level of inclination of an user toward a certain opinion. The closer to \(+1\) it is, the more the user is inclined to a pro-impeachment view. The closer to \(-1\), the more is she inclined to hold an anti-impeachment position.

The probability density functions (PDFs) were calculated from these polarity measures for each time slice, in order to understand how users are distributed over the different polarities. The derived PDFs were finally used to calculate the overall polarization of people using the polarization index \(\mu\).

6.3 Experimental results

Figure 7 shows the PDFs of the individual polarity values for each month. Observe that most of the individuals are clearly concentrated on divergent groups, with just a small number of users having polarity values close to 0. The left group (anti-impeachment position) has a greater density of individuals for most of the months, except for March of 2016, when the opposite groups seem to have about the same number of users.

Fig. 7
figure 7

PDFs for polarity values X of the general public per month

From the PDFs, we computed the polarization index \(\mu\) of the general public for each month. Figure 8 shows the temporal evolution of \(\mu\) and its related variables: the difference of populations \(\Delta A\) and the distance between gravity centers d. Note that the polarization index has its peak in March (\(\mu = 0.79\)), when \(\Delta A\) is close to 0, indicating that groups of opposite opinions had about the same density of users by that time, so that polarity value is largely determined by the distance between the gravity centers. Observe also that the polarization index recorded its minimum value in August (\(\mu = 0.64\)). Since there was no noticeable change in d for the adjacent months, we can assume that the decrease in \(\mu\) in August is mostly related to an increase in \(\Delta A\), which means that one of the groups became larger than the other. Note in Fig. 7 that the left group has a larger density than the right one in August, and the difference in their sizes appears to be the largest for the whole period.

In short, our results reveal that the general public recorded high values of polarization over the entire studied period. Since the distance between gravity centers (d) remains almost uniform over time, the polarization index was mostly affected by fluctuations in the difference between population sizes (\(\Delta A\)). Also, its consistent high value may reflect the tensions among the Brazilian population during the entire studied period, which was characterized by a number of pro and anti-government protests that took place before and after the impeachment proceedings. For instance, we can notice that the highest polarization value was observed in March 2016, the same month in which the largest anti-government protest in the history of the country took place. By the same month, there were also demonstrations of support for president Dilma, whose demands were opposite from the anti-government group. Hence, the highest value in March 2016 shows that the polarization of the population in social media may be a reflection of the real protests and ideological conflicts present in the country at that time.

7 Understanding elite and mass behavior

After evaluating the polarization for both groups separately, we also investigated if the behavior of the politicians (elite) and the general public (mass) followed any similar patterns. To achieve this, we used the datasets and the results of the previously described studies in order to perform two analyses. First, we carried out a qualitative comparison of the elite and the mass polarization processes in order to see if the public behavior in online social media is related to the decisions of the politicians at the Lower House. In the second analysis, we conducted a study about the temporal changes in the popularity of the tweets posted by the Brazilian politicians among the public on Twitter. Our goal was to understand how the audience for the tweets of the representatives changed over time by taking into account their position about our event of interest in this work: the impeachment of the ex-president Dilma Rousseff. Finally, we contrasted the popularity of politicians with the public mass polarization in order to understand whether politicians’ positions impact public polarization.

7.1 Comparing elite and mass polarization

After studying the polarization process for each of the groups, we compare the polarization of politicians (elite polarization)—evaluated on their real-world voting data—to the polarization of Brazilian people (mass polarization)—measured using online social media data. We evaluate the changes in polarization and its related variables through a qualitative analysis, since the number of observations was not big enough for a robust correlation study. Our purpose here is to understand whether the behavior of the general public (the mass) is related to the actions of the representatives in the Lower House (the elite).

Figure 8 shows how the polarization index \(\mu\) and its related variables—difference between populations sizes \(\Delta A\) and distance between gravity centers d—changed over time for both politicians (the elite) and people (the mass). In order to support our understanding of the visualizations, we also present a statistical summary of each of these variables in Table 4, which shows the arithmetic mean (Mean), standard deviation (SD), relative standard deviation (RSD), minimum (Min) and maximum (Max) values. It is important to highlight that, although the politicians’ voting dataset covers a broader period range, we calculated the statistics and made our following considerations based solely on the results from March to December of 2016, since the people dataset is limited to that period.

Fig. 8
figure 8

Time evolution of polarization index (\(\mu\)) and its related variables (\(\Delta A\) and d) for the elite and mass polarization studies. Some months are omitted either because there was no voting session for that month (House recess) or due to the restrictions of our study, which only takes into account voting events where PT and PSDB disagree

Table 4 Statistical summary for the polarization index (\(\mu\)) and its related variables (\(\Delta A\) and d) for people and politicians analysis

Observe that people recorded higher polarization index values for the entire period—having a mean of 0.71—whereas politicians recorded a mean value of 0.54 for the same time interval. The maximum value of polarization index for the politicians (\(\mu = 0.60\)) is smaller than the minimum value for people (\(\mu = 0.64\)), which indicates that Brazilian representatives were less polarized than the general public in the studied period. Also, the mass polarization seems to remain almost uniform across the period, having a small relative standard deviation (\(\text {RSD} = 5.6\%\)). By comparison, the elite polarization has a larger variation (\(\text {RSD} = 13.0\%\)), especially due to a major decrease in May.

The elite polarization exhibits more fluctuations over the period as compared to the mass polarization and it presents variations to both \(\Delta A\) and d. On the other hand, the small fluctuations of the mass polarization seem to be mostly related to the difference between the sizes of their positive and negative populations (\(\Delta A\)), whereas the distance d does not have large variations over the months (\(\text {RSD} = 2.5\%\)). For instance, when people record their minimum polarization (August 2016), \(\Delta A\) reaches its maximum value, while there are no noticeable variations to the distance d in the same time interval. However, note that these relations do not imply any causality between polarization, population size or distances from gravity centers.

7.2 Elite popularity on Twitter

Besides comparing the polarization processes, we also evaluated the behavior of the politicians (elite) in online social media by tracking the temporal changes in the popularity of their posts among the general public (mass). For that, we used the dataset containing the tweets of the Brazilian politicians shared between January of 2015 and November of 2016, introduced in Sect. 5.

Figure 9 shows the steps we followed on this analysis. To measure the popularity of a tweet, we took into account the number of retweets and favorites that a politician’s post received from the public on Twitter, data which is also available in the referred dataset along with the textual content of the tweets. The number of retweets consists of the number of times a user shared the tweet of a politician. The number of favorites indicates the total of users that liked the tweet posted by the politician. Having in mind that these measures either show the importance of the content posted (retweets) or agree to it (favorites), they seemed appropriate criteria to quantify popularity, which is also supported by other works in the literature (Lahuerta-Otero et al. 2018; Zhang et al. 2014).

Fig. 9
figure 9

Overview of the steps to evaluate the popularity of the politicians

Since our work revolves around the impeachment event, we also used the voting dataset to find out the position of each politician in the impeachment proceedings session at the Lower House, as shown in Fig. 9. For each one of the politicians on the Twitter dataset, their position was then marked in every tweet of that politician: if she voted “yes,” all of her tweets are marked as “pro-impeachment”; if she voted “no,” her posts are marked as “anti-impeachment”; and if she decided for the abstention, her tweets are labeled as “absent.” Hence, instead of running an individual analysis of the politicians or their parties, we decided to track the popularity of the posts according to the impeachment position of the politicians who wrote them. By doing so, we were able to explore which groups were more popular among the general public and how their audience evolved over time, trying to understand if the changes in the audience of their posts were related to the important political events that happened during the studied period.

In order to compare the popularity among the groups, we generated visualizations of the average number of retweets and favorites received by the anti-impeachment, pro-impeachment and absent groups of politicians over the months. Finally, inside each group, we contrasted the average popularity (retweets and favorites) of the tweets of the politicians in two conditions: (i) posts which contemplate specific topics that were detected by our Temporal Topic Evolution Study; (ii) posts that cover all the other topics that are not included in (i). In the former case (condition i), we used the top words of the super-topics “Political Crisis” (\(T_1^{\prime }\)) and “Activities at the Lower House” (\(T_2^{\prime }\)) to filter the tweets in the analysis, since these were the most relevant discussed topics regarding politics (see Sect. 5). Our intention is to understand if the changes in the popularity of the politicians of each group were related to the political issues or there were any other reasons behind it.

As explained before, the tweets were labeled according to the position of the politician during the impeachment voting session. Table 5 shows a summary of the number of politicians that hold each position and the total of tweets that were marked as absent, anti-impeachment, or pro-impeachment for the current analysis.

Table 5 Summary of the number of politicians and tweets in the Twitter dataset for the absent, anti-impeachment and pro-impeachment groups

We started our analysis by performing a basic characterization of the average number of tweets posted by the politicians of the pro-impeachment, anti-impeachment and absent groups. As shown in Fig. 10, while the absent group did not present visible differences in their tweeting behavior over the studied period, the average number of tweets posted by the pro-impeachment and anti-impeachment politicians exhibits a visible increase after March of 2016.

Fig. 10
figure 10

Temporal changes in the average number of tweets per user for the anti-impeachment, pro-impeachment and absent groups of politicians

Fig. 11
figure 11

Temporal changes in the average number of tweets per user for the anti-impeachment, pro-impeachment and absent groups of politicians, according to their topics of discussion

Based on the previous characterization, we checked if the increase in the tweeting behavior of each group was related to an intensification on the discussion of the aforementioned topics of interest (Political Crisis and Activities in the Lower House). The results are shown in Fig. 11. Here, we observe that:

  • The absent politicians do not exhibit any evident variation in their tweeting behavior over the months, and they also do not seem to focus the content of their posts on the politics-related topics of interest.

  • For the anti-impeachment politicians, the rise in the number of posts on Twitter seems to be mostly related to the discussion of the topics about politics. It is also important to point out that the contrast in the tweeting behavior over time is particularly larger for this group, in which politicians were possibly trying to spread the ideas against the removal of the ex-president Dilma from the office.

  • Regarding the pro-impeachment politicians, we also observed an increase in their posting behavior on Twitter after March of 2016. However, this rise does not seem to be particularly related to a specific topic for the entire period: in other words, the politics-related topics of interest were more intensely discussed by these politicians in March, April and May of 2016; yet these main topics were not the center of the attention of the pro-impeachment politicians in the other months of that year.

As for the results of our popularity analysis, Fig. 12 shows the temporal changes in the average number of retweets and favorites in the posts written by the politicians on Twitter. By analyzing the visualizations, one can quickly notice that the average number of retweets and favorites in the posts of the politicians on Twitter increased from March of 2016 onward. It is also noticeable that the anti-impeachment politicians received, on average, more retweets and likes (favorites) than the ones from the other groups. This difference stands out after March of 2016 when the audience for the anti-impeachment politicians seems to become even larger when compared to the audience of politicians with other positions.

Fig. 12
figure 12

Temporal changes in the average number of retweets and favorites per user for the anti-impeachment, pro-impeachment or absent groups of politicians

After investigating the differences in popularity between the groups of each position, we also studied if the politics-related topics affected the popularity of the politicians of the anti-impeachment and pro-impeachment groups. To that end, Figs. 13 and 14 show, respectively, the average number of retweets and favorites for the anti-impeachment and pro-impeachment groups, according to the topic of the posts. We decided to limit the analysis to these two groups, since the group of absent politicians is small and the popularity of its politicians—i.e., both the average of retweets and favorites—is the lowest of all the groups over the studied period.

Fig. 13
figure 13

Temporal changes in the average number of retweets and favorites per user for the anti-impeachment group of politicians, based on their topics of discussion

Fig. 14
figure 14

Temporal changes in the average number of retweets and favorites per user for the pro-impeachment group of politicians, based on their topics of discussion

From Fig. 13, we observe that, whereas the peak of retweets in the posts of anti-impeachment politicians was reached in August 2016, the highest average in the number of favorites was recorded in April 2016. Both of these peaks coincide with important and critical events: the impeachment voting session at the Lower House in April 2016 and the impeachment voting session at the Senate by the end of August 2016. With that in mind, we came out to the following interpretations:

  • Since the politicians of the dataset were the ones who voted during the impeachment session in April of 2016 at the Lower House, the peak in the number of favorites by that month could indicate that the part of the public on Twitter was agreeing with the position of the politicians regarding the impeachment.

  • In August of 2016, on the other hand, the final impeachment voting session took place at the Senate, without the active participation of the politicians in our dataset. Thus, the peak in the average of retweets by that month possibly shows that part of the general public was supporting the ideas of the politicians of that position by sharing their posts on Twitter.

Concerning the popularity of the pro-impeachment politicians, Fig. 14 shows that the politics-related Twitter posts of the individuals in this group received a higher number of retweets and favorites from the general public in March, April and May of 2016 when compared to tweets about the other topics. In the remaining months, however, the popularity of their tweets about politics is similar or inferior to the popularity of their posts concerning other topics. In light of these results, one can notice that:

  • The politics-related tweets posted by the pro-impeachment politicians had more noticeable popularity in March, April and May of 2016, which corresponds to the months around the voting (and approval) of the impeachment proceedings at the Lower House. Especially, the peak in the number of favorites/likes by April of 2016 might suggest an indication of approval of their “pro-impeachment vote” from the part of the general public that supported the impeachment of the ex-president Dilma.

  • Even though the pro-impeachment politicians cover the topics of interest (the political crisis and the political activities at the Lower House) in their tweets for the entire studied period (Fig. 11), the popularity of their political posts is restricted to a small period of time. It might indicate that their politics-related posts do not cause a different effect on their popularity when compared to posts about other topics.

7.3 Elite popularity versus mass polarization

At last, this section contrasts the study of the elite popularity on Twitter and the mass polarization analysis. The idea is to understand if the changes in the popularity of the politicians (elite) on Twitter—which revealed the public reactions over the posts of the politicians in that social media—share common aspects to the polarization process among the general public (mass).

As our previous studies show (Sect. 6), the mass polarization (evaluated from March to December of 2016) recorded high values during the entire studied period. The polarization index recorded high values upon two conditions: (a) the individuals in the group share divergent opinions about the discussed topic (and the general public was indeed divided between anti- and pro-impeachment ideas); (b) the size of the groups that share opposite opinions is similar, i.e., these groups have a similar number of individuals each.

Having that in mind and also considering the studies about the popularity of the politicians (Sect. 7.2), we present the following considerations. Just like the analysis of the mass polarization showed the predominant adoption of the contrasting ideas among the public on Twitter, we also observed that the popularity of the opposite groups of politicians (anti- and pro-impeachment) is many times higher than the popularity of the group of the politicians that did not have a clear position regarding the impeachment (absent group). This aspect reinforces the idea that the general public decided to adopt a position regarding politics, giving more support to the politicians that let evident their opinions and inclinations regarding the impeachment on Twitter.

The anti-impeachment politicians showed, on average, high popularity in their politics-related posts on Twitter over the year of 2016. It could indicate that part of the general public (who also shared that position) has participated and gave support to this group of politicians in online social media, trying to spread the ideas against the impeachment of the ex-president Dilma Rousseff.

The pro-impeachment group of politicians, on the other hand, showed high popularity in their tweets about politics in March, April and May of 2016; in the other months, though, the popularity of their posts about the political topics decreased. It suggests that part of the public that shares this position gave more support to the pro-impeachment group of politicians in the months that were critical for the impeachment. As the mass polarization is high for the entire period of 2016—meaning that the pro-impeachment individuals in the general public participate as much as the anti-impeachment ones—we came out to two possible interpretations:

  • The participation of the anti-impeachment individuals in the general public is mostly affected by the politicians having that orientation, so that the anti-impeachment politicians were important points of influence to that part of the public.

  • Since the popularity of the pro-impeachment politicians decreased after a certain period and the mass polarization is high for the entire year of 2016, the ideas shared by the pro-impeachment individuals of the general public might have other sources of influence rather than solely the politicians that held that position.

8 Conclusions

The revelation of corruption scandals, such as the Car Wash operation, and the impeachment proceedings of Dilma Rousseff in 2016 exposed an intense and profound political and social crisis in Brazil. Tensions emerged not only among Brazilian political parties but also among common citizens, which took the streets either to demand the impeachment of the president or to demonstrate support for her. Not only limited to the streets, these conflicts were also present in social media platforms, which set the stage for heated discussions among people.

This work analyzed and compared the polarization phenomenon among the Brazilian politicians, framed as the elite polarization, and the general public, framed as the mass polarization. Our study pointed out significant differences between the polarization variables for the politicians and the general public, also revealing that the polarization process was more intense among people (\(\overline{\mu } = 0.71\)) than among Brazilian politicians (\(\overline{\mu } = 0.54\)) during the whole period. The high and almost constant polarization values for the people may reflect the real ideological conflicts among the Brazilian population, meaning that they may have occurred during the whole studied period. The politicians, however, had a higher variation in their polarization values over time. According to our observations, the representatives recorded higher polarization values in 2016 when compared to the previous year, and these values increased after December of 2015. This situation coincides with the launch of the impeachment proceedings in that month, which possibly indicates that the representatives became more polarized after the impeachment started.

Furthermore, although the polarization values presented small variations for the general public, these changes were mostly related to the difference between the size of the populations. In other words, the polarization of people changes as the number of individuals that are concentrated in each of the opposite groups changes. This effect was not observed for the polarization among the politicians.

Our popularity analysis showed that the politicians received more support from the general public on Twitter from March of 2016 onward, a month before the impeachment voting session at the Lower House. This increase in popularity was limited to the anti- and pro-impeachment politicians, which suggests that the general public decided to adopt a position regarding the politics, giving more support to the politicians that made their opinions clear about the impeachment on Twitter. The anti-impeachment politicians were the most popular group, as they got more retweets and likes on their posts when compared to the politicians from the other positions. In contrast, the popularity of the politics-related posts from the pro-impeachment group only recorded higher values during the critical months regarding the impeachment of Dilma Rousseff, showing a mobilization during the period of political crisis. In sum, our analyses show that anti-impeachment politicians had a higher impact on the public for the whole period of study than pro-impeachment politicians, which were popular only during a short but critical three months period.