Introduction

Conversations on online social media have become a crucial aspect of modern communication (Smith 2018), shaping how individuals interact with each other, share information, and form connections (Grabowicz et al. 2011). Social network platforms enable conversations to reach a wide audience, while allowing for real-time sharing of information, opinions, and reactions. These online conversations are driven by three key factors: (i) the purpose of the users and how they want to communicate, (ii) the functionality provided by the platform and its limitations, and (iii) the user guidelines and recommendations governing the community (Gillespie 2018; Russo et al. 2023). The later two factors, functionality and guidelines, are key to encourage certain types of interactions and conversational structures over others. For example, while X (formerly known as Twitter) used to enforce users to write messages with less than 140 characters, Facebook and other thread-based forums do not have such restrictions, leading to more informative messages and shorter interactions (Alis et al. 2015). Additionally, some platforms have guidelines and recommendations that, while not technologically enforced, aim to guide user behavior. These guidelines serve as suggestions encouraging users to conform to an expected conduct. For instance, Instagram’s Community GuidelinesFootnote 1 promote authentic interactions to avoid spam, and Facebook groups often have customized rules. Whether these guidelines effectively contribute to shape user behavior remains to be explored. It is well known, however, that many of these rules, regulations and technical limitations change over time based on users interactions with the platform. A recent example of how users have forced platforms to modify their regulations is X. Before 2018, when users needed to write longer tweets (i.e. with more than 140 characters) would split the long text into multiple interconnected tweets, including formats like “Tweet 1/12”. In December 2017, X adapted to this behaviour by introducing a new feature called threads, allowing users to concatenate multiple tweets together in a sequence. This made easier to create and follow longer conversations or narratives within a single thread.

The goal of this work is to uncover the effect of platform guidelines on online conversations and evaluate the extent to which such guidelines influence participation. In particular, we focus on Reddit, a social media platform where people participate in “self-governing and self-organized” communities known as subreddits (Jamnik and Lane 2019; Medvedev et al. 2019). Each subreddit has its own rules and guidelines, specifying what is allowed inside the subreddit and recommending how users should behave, often based on a specific topic (e.g., r/science, r/gaming). Beyond being simple forum, Reddit has been widely populated with subreddits with their own guidelines and internal features, making it a valuable resource for conducting social research on opinion formation (Shatz 2017; Hintz and Betts 2022). In recent years, a plethora of studies of interesting subreddits has emerged, particularly focusing on studying their evolution and dynamics to understand how such communities develop and grow over time (Krohn and Weninger 2019; Weninger et al. 2013; Horawalavithana et al. 2022). Many studies have also analyzed discussions in Reddit communities to understand how interactions among participants influence behavior. For example, Petruzzellis et al. exploited the r/ChangeMyView subreddit to analyze changes in online information consumption behavior arising after opinion changes (Petruzzellis et al. 2023). In Cauteruccio and Kou (2023), Cauteruccio et al. investigated the emotional experiences in eSports spectatorship using the r/leagueoflegends subreddit: they show that spectators supporting the same team tend to engage in cohesive discussions, while interactions among those supporting different teams are less salient. Additionally, a significant body of research has focused on the language used in Reddit discussions, examining linguistic patterns, sentiment, and rhetorical strategies to gain deeper insights into the nature and impact of online communication within these communities. For instance, Helm (2024) studied the r/Incel community to identify subcultural discourse and understand how it affirms deviant behaviors, while Bouzoubaa et al. (2024) analyzed drug-related subreddits to understand their role in the online discourse surrounding substance use.

On Reddit, users can write and publish posts (known as submissions) or comments. Each post, together with its comments, constitutes a thread, i.e., a conversation where comments are organized hierarchically in a tree-like format. The post is the root of the thread, each comment is a node in the discussion tree, and replies to posts or comments create branches in the tree. Reddit communities are moderated by designated users (the moderators) who establish the community rules and ensure that everyone follows them when participating. Moderators maintain order by performing actions such as deleting posts or comments and banning users. In each community, the rules are often displayed on the side of the webpage.

In this work we focus our attention on the /r/AmItheAsshole (AITA) subreddit,Footnote 2 an online community where people post stories about personal experiences having ambiguous moral valence, asking othes if they have been “assholes” (or not) in the narrated story, i.e. if they are to blame for the conflict described. Users creating such posts should provide detailed descriptions of their stories in the text, including relevant background information about the people involved. Other users then perform the explicit judgment by voting, which involves writing a comment including a specific acronym corresponding their judgment. The available acronyms provided by the community are listed in Table 1.

The subreddit guidelines suggest that, along with the acronym, users should include in the comment a brief motivation for the vote to explain their choices to other readers. The AITA community uses Reddit’s integrated voting system to allow participants to rate the judgments they agree with by upvoting them. Expressing disagreement is not allowed in this context, since downvotes are used to report off-topic or spam discussions and harassing comments. The community has established an 18-hours waiting period before assigning the final verdict. Users must vote within this timeframe. As users upvote different comments, a consensus emerges over time, with one judgment gaining the majority of agreements as the collective decision. After the time window passes, this judgment is then accepted as the official verdict and is made public by assigning a flair to the post, i.e. a tag with the respective judgment acronym. More details about the voting process are provided in "Operationalization" Section.

Table 1 Acronyms provided by the AITA community

In the AITA community, the explicit request for a judgment is, therefore, a requirement of the subreddit, allowing researchers to study how humans express moral judgments through socio-linguistic features. Indeed, the comments contained in AITA threads offer the ground truth of what people voted for and often why. This motivates why the AITA community has received much attention the last two years. Botzer et al. (2023) exploited the AITA subreddit to study the presence and impact of moral valence, as well as whether gender and age play a role in users’ judgments. De Candia et al. (2022) examined which demographic factors and topics are associated with judgments, while Giorgi et al. (2023) analyzed the possibility of identifying, through linguistic and narrative features, whether the author of the post is also the character in the story or is narrating a story from a third-person perspective.

In order to analyze the dynamics of interactions in the AITA subreddit, we collected more than 6,000 threads that received significant attention in 2023 (see details in "Data" Section). For each thread, we compute the individuals’ amount of judgment and the group level of disagreement ("Operationalization" Section). We then model each thread as a complex multi-graph network of user interactions evolving over time ("Temporal network analysis" Section). We study the growth of such networks reconstructing each conversation over time and comparing the evolution of structural properties with respect to the existing literature on growing real social networks,Footnote 3 including other subreddits ("Results" Section). In particular, we focus on the clustering coefficient and average shortest path length as structural properties growing over time, since these highlight the peculiar evolution of AITA networks and help explain the reasons behind the user behavior. Furthermore, we compute the reciprocity and the disagreement of such networks to understand if they play a role in the AITA discussions.

In short, the contributions of this work are the following:

  • Our temporal analysis of the communication exchange shows that Reddit user interaction networks consist of two subgraphs, a star and a periphery, that exhibit different speeds of growth ("Results" Section). The star structure is mainly formed by users not engaging in conversations and rather answering to the root message of each subreddit thread (i.e., post), while the periphery is mostly composed of users engaging in long conversations.

  • We find that the speed at which participants contribute in these subgraphs is highly influenced by the intention of the participants. In the periphery subgraph, the participants who vote in addition to writing comments respond almost twice as slowly as users just commenting ("Response time" Section). At a macro level, we explain how people engage in conversations with other users in subreddits through the insights revealed by the evolution of structural properties. Specifically, the increasing average shortest path length as well as the decreasing (and very small) clustering coefficient reflect the behavior of people discussing mostly with only one other user, often through a single message ("Structural propertiesof AITA evolvingover time" Section).

  • Our analysis shows that these interaction networks evolve differently compared to other social networks, despite falling into the same category of “real social networks” ("Growing networks of Redditthreads" Section). More specifically, we compare the AITA subreddit with other subreddits by examining the growth of the two subgraphs within the dynamic networks. We demonstrate that the speed of the star subgraph is between 2 and 3 times larger than that of the periphery subgraph in AITA, which is significantly larger than in other Reddit communities. We interpret this as a consequence of community rules shaping user behavior.

  • Our analysis shows that disagreement in the judgment process is associated with more interactions in the thread but may prevent some users expressing a judgment. Specifically, we prove that when the disagreement is higher (i.e., when the judgment is not obvious), people prefer to discuss rather than judge: they engage with others through more comments, and if they express a vote, they struggle to clearly pick a side ("Disagreementand reciprocity" Section).

Finally, we analyze the underlying social dynamics among users by drawing on social psychology theories and interpreting their effect on the graph structure evolution. Specifically, we interpret our results through the lens of Social Judgment Theory (Brehmer 1988).

Theoretical framework

One of the goals of this work is to shed light on how people discuss in online communities where they are asked to explicitly express their opinion. Specifically, we aim to measure to what extent users’ disagreement affects the evolution of the online conversations. We do this by modeling all the threads in the subreddit as a set of growing networks of user interactions (see "Methodologicalframework" Section). In this section, we lay the groundwork for understanding social judgment dynamics ("Social judgment" Section), and we provide the state of the art of growing social networks ("Growing socialnetworks" Section).

Social judgment

In social psychology, judgment is defined as the cognitive process of forming opinions, evaluations, or assessments about oneself, others, or situations. It generally consist in the product of non-conscious systems that operate quickly based on some evidence (Gilbert 2002). For example, when engaging in a conversation with someone, body language, tone of voice, and facial expressions are cues that serve as evidence to formulate judgments about the person. Social psychologists have studied various aspects of judgment, including how people make decisions, evaluate others, and interpret social information. Specifically, Social Judgment Theory (SJT) (Brehmer 1988) is a theoretical framework within social psychology that seeks to understand how individuals form and evaluate judgments about themselves and others. SJT also investigates the reasons why, in particular social contexts, people are more inclined to express judgments (Morrison and Miller 2008; Noelle-Neumann 1993; Morrison and Miller 2011; Hornsey 2003; Matthes et al. 2010). For example, (Adamic et al. 2021; Spears 2021) found that users are more prone to express negative judgments in anonymous settings where either the giver or the receiver of the opinion is unknown. Despite the extensive scientific literature, we have little understanding about the role that disagreement plays in such settings.

Research on user interactions in online platforms has primarily focused on conflict, controversies, and affective polarization (Addawood et al. 2017; Garimella et al. 2018; Lamba et al. 2015; Mejova et al. 2014; Conover et al. 2021), analyzing these social behaviors mostly through sentiment and topic analysis. In particular, Kumar et al. (2018) used Reddit data to study conflictual interactions of users across different communities. They found that less than 1% of communities start the majority of conflicts and that such conflicts are initiated by highly active community members and carried out by significantly less active members. In our work, instead, we are interested in studying the role of disagreement among users. Despite the plethora of studies about the role of polarization and conflict in online conversations, the question of if and how disagreement affects people’s moral judgments remains unexplored.

Growing social networks

Conversational data, such as the actions and interactions of users in online platforms, can be modeled as dynamic social networks (Newman 2003; Scott 2000; Wasserman and Faust 1994). For example, a follower-followee relationships, Facebook friendship links, e-mail or message exchanges, and retweet patterns. When these networks are not synthetic but taken from real user interaction data, they are commonly referred to as “real social networks” (Newman 2003; Leskovec et al. 2005) to emphasize that the original data originates from actual networks rather than mechanistic models. These types of networks include a wide variety of online connections such as friendship or following relations in social media (e.g., X), interactions such as sharing messages, replying to emails, or real-life interactions (e.g., academic co-authorship). The properties of this category of networks have been extensively studied from both a static and a dynamic perspective. The structural evolution of growing social networks has been intensively studied by Newman, who analyzed structural properties of some models of growth (Newman 2003), proved that preferential attachment is the origin of power-law degree distributions in collaboration networks (Newman 2001), and developed a new growing model that reproduces features of real-world friendship networks (Jin et al. 2001). However, most research has focus on studying structural properties of networks after a sufficiently long period, rather than on how such properties evolve during networks’ growth (See Table 2). An exception is the work of Leskovec (Leskovec et al. 2005) who, through empirical observation of four real graphs (three of which were social) growing over time, demonstrated that such networks become denser over time and that their diameter shrinks.

In summary, real social networks represent a subclass of social networks that includes a wide variety of graphs with diverse underlying dynamics. As a result, discoveries in the literature about the growth of real social network structures and properties over time may not be universally applicable to all graphs within this class. For example, it is reasonable to think that a graph of retweets could grow differently over time compared to graph of messages in a group chat. Despite belonging to the same category of networks, further investigation into the differences in their structural properties as they evolve over time is needed.

Table 2 State of the art of growing real social networks. The last row represents our data. The table reports: number of vertices |V|, global clustering coefficient GCC, density d, average shortest path length ASPL, preferential attachment PA, exponent of the degree distribution \(\gamma\), type of network (directed or undirected), type of growth (adding nodes or edges)

Methodological framework

In order to study online conversations in which users express moral judgments, we collect data from the AITA community ("Data" Section) and operationalize the judgment behavior of participants ("Social judgment" Section). Then, we provide a measure for disagreement among users, representing how much polarizing their judgments are ("Disagreement " Section). Finally, we model each conversation as a growing complex network and we study its evolution in time ("Temporal network analysis" Section).

Data

We downloaded 6366 threads, containing a total of 6,372,251 comments, from the AITA subreddit using the PRAW library.Footnote 4 In particular, we download the “top” submissions — those having the highest score, measured as the difference between upvotes and downvotes of a post (i.e. the thread root). By definition, top posts are likely to have received significant attention, possibly resulting in a large volume of comments. In order to gather a representative dataset, we performed 10 different queries across various temporal scopes, ranging from one week to multiple years, each gathering different sets of top submissions along with all the comments. We set the limit of each query to 1000 to comply with the Reddit API limitsFootnote 5 and we removed duplicated threads. The final dataset size is reported in Table 3, which also contains the temporal scope of data selection. Figure 1a shows the distribution of thread sizes (measured as the number of comments), while Fig. 1b shows the distribution of final verdicts across threads. Note that 75% of the threads have less than 2,000 comments, and 80% of them have been assigned “NTA” as final verdict.

Table 3 Data collection of top submissions from the AITA subreddit. We ran a total of ten queries during September and October 2023, collecting the top threads with different temporal scopes
Fig. 1
figure 1

Statistics of the AITA threads dataset. In (a) the ECDF of the size (number of comments) per threads while in (b) distribution of final verdicts of the threads

Operationalization

Judgment behavior

In the AITA community, users participate by writing posts (to be judged by others) or comments (to judge others). This paradigm established by the community implies that people commenting are expected to express a vote. We distinguish between voting (i.e. writing comments containing at least one acronym among those listed in Table 1) and discussing (writing text without expressing a vote). For the purposes of this work, we decided to disregard the INFO acronym, as it does not constitute a vote by definition.

The AITA community has specific guidelines about how users should vote and how the votes are processed to obtain the final verdict. Users can access these rules from the dedicated page,Footnote 6 the FAQ page,Footnote 7 or the “Voting rules” section in the navigation panel of the homepage. These resources are also referenced in every post since a bot automatically includes them in a top-level comment produced as soon as the post is published. Such comment is pinned on top for maximum visibility, so users are aware of how they are expected to behave. According to the AITA rules, users must vote including one and only one voting label in their top-level comment. This implies that: (i) users cannot include more than one label in the text, (ii) the label should be one of those provided by the community and correctly spelled, and (iii) the comment containing it must appear in the first level of the thread. The label can appear at any point in the text and does not necessarily have to be capitalized. Since the judgment process (votes and upvotes) lasts 18 h, the comments should also be published within this time window to be part of the voting contest.

Disagreement

To measure disagreement of AITA threads, we use the codified information about judgments expressed by users. As mentioned earlier, in the AITA community, users explicitly take a side and make it public when they express a vote. Consequently, we label each comment with the respective judgment label. The voting labels represent the sides that users are taking, making it straightforward to determine which side each comment belongs to. In this context, we measure the level of disagreement in a thread by measuring the uncertainty of the judgments expressed in the comments. We do this by computing the probability of each label appearing (i.e., of each side to be taken) and measuring the Shannon entropy of the post.

Following (De Candia et al. 2022), who used binary entropy on aggregated votes to measure controversiality, we use multi-label entropy to operationalize disagreement.

Given a the set of labels \(\mathcal {X}\), the entropy of a post is defined as:

$$\begin{aligned} H(X) = - \sum _{x \in \mathcal {X}} p(x) \log p(x) \end{aligned}$$
(1)

where p(x) is the discrete probability distribution of the labels appearing in the comments of the post. Since we do not consider the INFO label, we have six possible labels (see Table 1), so the maximum value of entropy for each post is \(\log _2 |X| \approx 2.6\). Values of the entropy close to 2.6 indicate maximum uncertainty and therefore maximum divisiveness: judgments are uniformly split among the different labels, with people equally taking all the different sides. In this case, we can say that the post has high disagreement. In contrast, a value of 0 would represent the maximum level of certainty: all judgments are unanimous and users all agree on taking one side, so the post has no disagreement. As shown in Fig. 2, around 53% of the posts have low entropy (\(< 0.65\)), indicating that in more than half of the posts people agree on the judgment.

Fig. 2
figure 2

Distribution of the disagreement (computed as a thread entropy) across all the collected threads from the AITA community. Values fall in the range [0, 2.6]. Vertical lines divide the disagreement in low (\(H < 0.65\)), medium-low (\(.65< H < 1.3\)), medium-high (\(1.3< H < 1.95\)) and high (\(H > 1.95\))

Temporal network analysis

We model the discussions collected from the AITA community as networks of user interactions. For each thread, we build a directed multi-graph \({M} = (V, E, {t, x})\) with attributed nodes and edges. The set of vertices V represents users and the set of edges E represents the answering comments. We extract the voting acronyms of each comment and we store them as a vertex attribute set X. Hence, \(x: V \rightarrow X\) is a function assigning to each vertex, the set of judgments expressed by that user in their comments. Since we could not determine the expressed vote from comments containing different acronyms (e.g., [“NTA”, “ESH”, “YTA”]) we label those judgments as unsureFootnote 8. The temporal information is embedded by a scalar \(t: E \rightarrow T\), stored as an edge attribute, where T is an ordered set of time annotations with a resolution of seconds. We perform a statistical test to prove that such networks are scale-free. This because, in order to compare our network with the state of the art on real social networks, we first need to demonstrate that our networks are scale-free, i.e. that their degree distribution follows a power law distribution \(k^\gamma\), where \(2< \gamma < 3\). Hence we fit our empirical data to a power-law distribution and we measure the distribution of the exponents to verify that they mostly fall in the range [2, 3]. The results are shown in Fig. 3. To assess the goodness of the fit we performed a one-sample Kolmogorov-Smirnov (KS) test for all the degree distributions of the networks, which returned a coefficient smaller than.35 for all the networks and a p-value greater than.001 for 88% of the networks, confirming that the empirical distribution of our data is (significantly) very close to a power-law distribution. Such results confirm that AITA networks are scale-free, hence we can compare their properties with other real social networks.

Then, we study each network M of user interactions from a temporal perspective, by reconstructing them in time. We obtain, for each thread, a set of directed networks \(G = \langle G_1,..., G_k \rangle\) that grow over time, where each network \(G_k = (V_k, E_k), k = 0\dots |E|\) is the k-th network. Therefore, \(V_k \subseteq V\) includes the user starting the thread and all users commenting until k-th messages have been posted. Each edge in the set \(E_k = (v, u) \subseteq E\) indicates that user v has written at least one comment to user u.

Fig. 3
figure 3

Distribution of coefficients (\(\gamma\)) of AITA networks’ degree distribution

Results

As users join the conversation thread, new interactions are formed over time, and the network grows. The dynamic evolution of the network generates two distinct subgraphs: one consisting of participants directly responding to the author of the post (i.e., users writing first-level comments), and the other comprising users joining with comments located at deeper levels in the thread. We refer to these subgraphs as the star and the periphery, respectively. Figure 4 illustrates one of the AITA networks evolving over time, demonstrating how these interactions and subgraphs develop. Red nodes represent users voting in at least one comment, while blue nodes represent users writing comments without expressing a vote (i.e., discussing). Figure 5 shows that most of the voters are located in the star, a consequence of the community rules, which state that votes should be expressed in first-level comments. We describe in detail how this rule impacts user behavior in the community in "Discussion and conclusion" Section.

Fig. 4
figure 4

An example of user interaction network built from an AITA thread. Nodes (voters in red, not voters in blue) are users and direct edges \(e_{ij}\) represent comments from user i to user j. The graph shows two different sub-structures: a star, with the hub node corresponding to the original poster and everyone who replied to it, and a periphery consisting of comments and replies among users

Fig. 5
figure 5

Distribution of users expressing their opinion as a vote (left) and of users joining the thread with a first-level comment directly to the author of the post (right). On average, (i) half of participants express a vote, and (ii) 60% of users in the AITA community join the thread in the star. Conversely, the periphery includes most of the participants discussing without expressing a vote

In the following subsections, we provide different views of these networks of interactions and their subgraphs, and investigate whether the guidelines of the AITA subreddit would result in significantly different structural and growing properties. First, we describe why the star and the periphery exist, and we explore the response time of comments in the network ("Response time" Section). In "Structural propertiesof AITA evolvingover time" Section we analyze how the networks growth from a global perspective, by comparing the evolution of their structural properties with the state of the art of real dynamic social networks, previously summarized in Table 2.

Then, in "Growing networks of Redditthreads" Section we compare the growth of the two substructures of AITA networks with networks from other subreddits, concluding that the growth speed of the star is between 2 and 3 times faster than that of the periphery subgraph. This difference is significantly larger than in other subreddits. Finally, in "Disagreementand reciprocity" Section we examine the relation between thread entropy and other features of the threads to demonstrate that disagreement plays a role in the discussions of the AITA community.

Response time

To capture how quickly users participate in the star and in the periphery, we compute how fast they respond to a message (i.e., how fast a replying edge is added in each subgraph). We calculate the time differences between a comment and its parent node (the post-root or the preceding comment) and we refer to this quantity as the response time R. We only consider response times within the range of \([\mu - 2\sigma , \mu + 2\sigma ]\) to exclude outliers, where the \(\mu\) is the mean response time of parent–child edges in the given graph and \(\sigma\) represents one standard deviation from the mean.

Figure 6 shows the distribution of response times in the star (left) and in the periphery (right), both for comments containing a vote (blue) or not (red). On average, the response time in AITA threads is between \(10^4\) and \(10^5\) s. Our main interest lies in the periphery, where we observe that the response time of voting comments is higher than non-voting comments, suggesting that writing a comment that contains a judgment requires more time. In "Discussionand conclusion" Section we discuss this phenomenon in depth in relation with the AITA community guidelines and with SJT. The difference in response time between voting and non-voting comments in the star is neither interesting, due to a large imbalance in the data—with more than 70% of voting comments in the star—, or statistically significant.

Fig. 6
figure 6

Response time of AITA threads. In the star, the \(\bar{R}\) of voting comments is \(\sim 3\) hours while it is \(\sim 2\) hours for non-voting comments. In the periphery, the \(\bar{R}\) is \(\sim 6\) hours for voting comments and \(\sim 7.5\) hours for non-voting comments

Finally, note that the difference in the average response time between the star and the periphery is very small, and is likely an artifact of how the measure has been constructed. While the response time in the periphery always represents the difference between a comment and the immediate reply, that is not the case for the star. In the star subgraph the response time will always increase as the networks grows since the parent comment is the root (post). For instance, the R between a given comment and the root will always be larger than the distance between a previous comment and the root.

Structural properties of AITA evolving over time

According to the literature, the average shortest path length of growing real social networks usually decreases over time (Leskovec et al. 2005; Jeong et al. 2001; Barabási et al. 2002; Lee et al. 2006; Barabasi and Albert 1999; Boccaletti et al. 2006; Dorogovtsev and Mendes 2002; Watts and Strogatz 1998; Newman 2002; Ravasz and Barabási 2003). This happens because the average number of steps needed to connect two random individuals tends to become relatively small due to the increasing number of paths available. As the network expands over time, more connections are established, increasing the likelihood of finding shorter paths between individuals. The literature attributes this phenomenon to (i) the presence of highly connected individuals (“hubs”) that reduce the distance between different parts of the network, and (ii) the tendency for networks to exhibit a clustered structure, creating local neighborhoods or communities within the network. Hence, real social networks that are scale-free exhibit preferential attachment and community structure, both contributing to shortening the average path length (Barabasi and Albert 1999; Pattanayak et al. 2022; Sallaberry et al. 2013).

In this work, we demonstrate that Reddit networks of user interactions evolve differently from what is described in the literature about growing real social networks. Specifically, during the network reconstruction process (explained in "Temporal network analysis" Section), every time a new edge is added, we calculate the following structural properties of the network: density (d), global clustering coefficient (GCC), average shortest path length (ASPL) and diameter (D). We show that despite being scale-free (see "Temporal network analysis" Section), their ASPL increases with time. Moreover, their global clustering coefficient (GCC) is five orders of magnitude smaller than expected since, on average, an extremely small number of clusters are formed. Figure 7 shows the evolution of these metrics over time for all threads (i.e., averaging the metric value at each timestamp over all the networks). The more edges are created over time, the more the ASPL increases while the GCC decreases. Moreover, the GCC is, on average, very small.

Fig. 7
figure 7

Structural properties of AITA networks growing over time. Each plot shows the metric averaged over all the networks at each timestamp when a new edge is added: a average shortest path length (ASPL), b global clustering coefficient (GCC) c density (d), and d diameter (D)

This unexpected behavior of the network is what causes the increase of ASPL over time. Table 2 shows the state of the art of real social networks growing over time. Note that all the examples contained in the table have a high GCC and, when available, a decreasing ASPL over time. By comparing the last row, which represents our AITA networks, with other rows, it is clear that the GCC is negligible and that the ASPL behaves differently when such networks evolve: the more edges are added, the more the ASPL increases over time.

Growing networks of Reddit threads

In this section, we examine the growth speed of the two substructures in the AITA subreddit and compare it with other subreddit networks where the community rules do not incentive a particular behavior. For our comparison, we use five distinct pre-existing subreddits, which are openly available online and include temporal information of the comments. We pre-process these datasets by removing threads containing fewer than 2 comments, as well as duplicate comments. Table 4 shows basic statistics of the datasets used after pre-processing them. For each dataset, we reconstruct its conversations over time following the same methodology described in "Temporal network analysis" Section.

In order to compare the speed of conversations with similar duration over time, we compute the distribution of the thread lengths for each subreddit. Then removing outliers (i.e., extremely long conversations), we group threads by length in time (dividing them into 10 bins) and compute the speed for each group of conversations. We calculate the speed of both the two growing subgraphs as follows:

$$\begin{aligned} S(g_{m}) = \frac{|e_{{m}}|-|e_{{m}-1}|}{\Delta {m}} \end{aligned}$$
(2)

where \(|e_{m}|\) is the total number of edges of the subgraph g at minute m. We compute the speed for three different time intervals (1 min, 10 min and 1 h) to observe the growth at different granularities. Speeds that could not be computed because of missing data have been set to 0. Figure 8 shows that the difference between the speeds of growth of the two subgraphs is larger in AITA than in other subreddits. The horizontal bars in the plots represent the difference in speed as the number of nodes that join the conversation every minute. Observe that such difference is higher in the AITA community, where the speed of the star is around 2 and 3 times the speed of the periphery. The results for the 10-min and 1-h intervals are not plotted for simplicity, as they yield similar results. We discuss the implications of this result in "Discussionand conclusion" Section.

Table 4 Data collection of threads from five subreddits
Fig. 8
figure 8

Difference in the speed of growth between star (blue) and periphery (red) subgraphs, averaged over 10 different thread duration (bins) for every subreddit (panels af). The speed is computed for every minute. The x-axis represents the number of nodes that join the conversation every minute, while the y-axis represent the length group of threads (10 bins)

Disagreement and reciprocity

To understand if disagreement in the judgment process is what drives discussions in AITA conversations, we verify the existence of a monotonic relationship between thread entropy (computed in "Disagreement " Section) and other features of the threads, such as: the ASPL and GCC (computed in "Structural propertiesof AITA evolvingover time" Section), the percentage of users participating only once, the length of the thread (in number of comments), the percentage of users that participate without voting, the average length of comments (in number of words), the score of the comments (see "Data" Section), the thread duration over time, the frequency of the comments (number of edges per minute), the average sentiment of the thread, and the percentage of users expressing an “unsure” comment (see "Temporal network analysis" Section). Among these features, we also include a measure of reciprocal interactions.

Reciprocity is an important behavioral feature of discussion dynamics that fosters mutual participation in conversations between users (Aragón et al. 2017). It is traditionally defined as follows (Aragón et al. 2017):

$$\begin{aligned} r = \frac{E^{\leftrightarrow }}{E} \end{aligned}$$
(3)

where \(E^{\leftrightarrow }\) corresponds to the number of bidirectional edges and E corresponds is the total number of edges. This metric ranges from 0 to 1, where a value of 0 indicates the absence of reciprocal edges in the network, and a value of 1 indicates that all edges are reciprocated. We are interested in measuring the amount of reciprocity in AITA threads to assess its role in the judgment process, especially in relation to disagreement. To characterize reciprocity, we exploit the directed network of replies between users in each thread. In such networks, a directed edge between user u and v exists if user u replied to user v in the discussion. By using the metric in Eq. 3, we compute the reciprocity for every static network. Figure 9 shows the distribution of reciprocity in our dataset of networks. Such distribution is right-skewed, with very small reciprocity for the majority of the threads (0.03 on average), suggesting that very few comments in the AITA discussions are reciprocated. Moreover, while the theoretical upper limit of reciprocity is 1, the maximum value observed for this metric in our data is \(\sim 0.43\), revealing that there are no threads with high levels of mutual exchange. We interpret such result in connection with other findings in Sect. 5.

Fig. 9
figure 9

Distribution of reciprocity (r) in AITA networks

To corroborate the existence of a relationship between disagreement and all the above-mentioned features, we compute the Spearman rank correlation, with results summarized in Table 5 and discussed in the following Sect. 5. We observe that when thread entropy is high (i.e., there is more disagreement in the judgment expressed), users tend to write more than one comment, often engaging in reciprocal discussions with others. They also write more comments, prefer not to vote, and if they do, they include more than one label in the comment, indicating their uncertainty in picking a side. Notably, when randomizing the networks by edge rewiring, the relationship between disagreement and reciprocal discussions tends to disappear.

Table 5 Spearman rank correlation between disagreement (thread entropy) and thread features

Discussion and conclusion

In this paper, we analyzed Reddit threads by modeling them as networks of user interactions and by computing the evolution of their structural properties over time. We show that these networks differ from real social networks, despite falling in the same category, as they exhibit a negligible GCC and an increasing ASPL. We also demonstrated that networks of the AITA community grow differently with respect to networks from other subreddits, as the difference in speed between the two subgraphs is larger than in other subreddits. In this section, we discuss such results in the context of Social Judgement Theory, particularly regarding disagreement in the judgment process.

We interpret the results presented in "Structural propertiesof AITA evolvingover time" Section by referring to the structure of the platform, which allows threaded-structured conversations and shapes the user interaction differently compared to other real social networks. Indeed, Reddit is not a relationship-based social network, meaning that most of the user interactions are content-driven and not user-driven (Makow et al. 2017). This means that users on Reddit do not join to comment on a specific person but on a specific content (post or comment). This difference in how the platform is built shapes user interactions differently, generating a different behavior in the networks as they evolve over time.

Furthermore, the unexpected behavior of GCC and ASPL reveals that participants mostly interact with only one other user, and often by a single message. To further inspect such user behavior, we derived a measure of reciprocal interaction and its relation to the disagreement in the judgment process of AITA threads. We have shown in "Disagreementand reciprocity" Section that disagreement plays an important role in online discussions where people are expected to express a judgment. It is significantly related to the generation of more discussions and more reciprocal interactions and, at the same time, to more uncertainty in judgment expression. This could reveal that, despite the anonymity of users on Reddit, users might not feel free to explicitly express their opinions in discussions with high disagreement. This is coherent with SJT: indeed, if it is true that people are more prone to express opinions in anonymous environments and settings (Adamic et al. 2021; Spears 2021), it is also true that in situations of high disagreement they perceive less support for their viewpoint from the social environment, making them less likely to express their judgments (Glynn et al. 1997; Chun and Lee 2017). Furthermore, in relation to the expression of social judgment in online discussions, in this work we have also shown that comments containing a judgment have a higher response time than comments that do not include it ("Response time" Section). This finding aligns with moral judgment theories stating that responses to moral dilemmas require cognitive control, which is an emotional process that takes time (Suter and Hertwig 2011). The more time needed for voting comments could also be due to the AITA community guidelines that encourage users to include a justification for the expressed vote in the text of their comments. In summary, the obtained results contribute to the advancement of unexplored aspects of the SJT, especially related to online communication.

We conclude that the temporal analysis of the structural properties of these networks reveals the following behavioral patterns of users discussing on Reddit. Participants mostly interact with only one other user, often by a single message. The lack of clusters, together with the very small reciprocity, suggests that most of the new users participating in the conversation do not engage with more than one person. They join the thread to respond to a single user, rarely with more than one message exchange.

In this work we also demonstrate that the speed of the star in the AITA conversations grows faster than the periphery (Sect. 4.3). We interpret this as a consequence of community guidelines enforcing the behavior of participants, since it is a direct consequence of the community rules. As explained in "Judgmentbehavior" Section, these rules indicate that votes expressed in comments that are not first-level will not be considered for the final judgment verdict, hence encouraging people to participate in the thread by answering to the post author. The periphery is, as a consequence, a spontaneous behavior of the users who discuss instead of voting (only 30% of the voters are voting in the periphery, as shown in Fig. 5).