Deep Agent: Studying the Dynamics of Information Spread and Evolution in Social Networks

Garibay, Ivan; Oghaz, Toktam A.; Yousefi, Niloofar; Mutlu, Ece Çiğdem; Schiappa, Madeline; Scheinert, Steven; Anagnostopoulos, Georgios C.; Bouwens, Christina; Fiore, Stephen M.; Mantzaris, Alexander; Murphy, John T.; Rand, William; Salter, Anastasia; Stanfill, Mel; Sukthankar, Gita; Baral, Nisha; Fair, Gabriel; Gunaratne, Chathika; Hajiakhoond, Neda B.; Jasser, Jasser; Jayalath, Chathura; Newton, Olivia B.; Saadat, Samaneh; Senevirathna, Chathurani; Winter, Rachel; Zhang, Xi

doi:10.1007/978-3-030-77517-9_11

Ivan Garibay³,
Toktam A. Oghaz³,
Niloofar Yousefi³,
Ece Çiğdem Mutlu³,
Madeline Schiappa³,
Steven Scheinert³,
Georgios C. Anagnostopoulos⁴,
Christina Bouwens³,
Stephen M. Fiore³,
Alexander Mantzaris³,
John T. Murphy⁵,
William Rand⁶,
Anastasia Salter³,
Mel Stanfill³,
Gita Sukthankar³,
Nisha Baral³,
Gabriel Fair⁷,
Chathika Gunaratne³,
Neda B. Hajiakhoond³,
Jasser Jasser³,
Chathura Jayalath³,
Olivia B. Newton³,
Samaneh Saadat³,
Chathurani Senevirathna³,
Rachel Winter³ &
…
Xi Zhang⁴

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Included in the following conference series:

Conference of the Computational Social Science Society of the Americas

483 Accesses
3 Citations

Abstract

This paper explains the design of a social network analysis framework, developed under DARPA’s SocialSim program, with novel architecture that models human emotional, cognitive, and social factors. Our framework is both theory and data-driven, and utilizes domain expertise. Our simulation effort helps understanding how information flows and evolves in social media platforms. We focused on modeling three information domains: cryptocurrencies, cyber threats, and software vulnerabilities for the three interrelated social environments: GitHub, Reddit, and Twitter. We participated in the SocialSim DARPA Challenge in December 2018, in which our models were subjected to an extensive performance evaluation for accuracy, generalizability, explainability, and experimental power. This paper reports the main concepts and models, utilized in our social media modeling effort in developing a multi-resolution simulation at the user, community, population, and content levels.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Multiscale online media simulation with SocialCube

Article 23 January 2020

An Agent Architecture for Simulating Communication Dynamics in Social Media

Modeling disinformation networks on Twitter: structure, behavior, and impact

Article Open access 26 January 2024

1 Introduction

Emerging Online Social Networks (OSNs) have revolutionized the public information environment in an unprecedented way. Thus, it is crucial to study the process of the spread and evolution of online information to understand the reach and impact of news, ideas, and knowledge in OSNs. An accurate and scalable computational simulation of this process could potentially help combat misinformation campaigns by adversaries, efficiently deliver critical information to local populations during disaster relief operations, and contribute to social construction and policy designs that rely on information dissemination.

Despite progress in this field of research, current computational approaches to social and behavioral simulations have not been well positioned to uncover the underlying dynamics that explain the inner workings and reasons for the selection and diffusion of information in online social platforms. Current approaches to online social dynamic simulations fall into three main categories: (I) the statistical analysis and modeling of a particular phenomenon such as “information evolution” using a particular dataset and fitting a statistical model to the data, for instance, [1]; (II) the statistical physics approach using the Agent-Based Model (ABM) simulation as an extension of dynamic equation modeling; and (III) approaches using ABMs through “translating” a theoretical model into the agent-based framework [2]. Although the first approach can be used in econometrics to predict the economic outcomes a few months ahead, it fails to predict rich system dynamics (such as during a financial crisis) correctly, and does not take complex human dynamics into account, though it models a single dataset accurately. The second approach is typically based on a “Brownian agent” [3]), and applies agent-based models in a very different capacity than the more standard practice of using them as a complex systems modeling tool. The Brownian agents are restricted by the stochastic physics framework in which they are embedded, resulting in less capability in capturing complex dynamics behavior. Lastly, the third approach focuses on replicating a single phenomenon; the agents mirror a single set of equations that are focused on an observable macro-pattern instead of the deep cognitive mechanisms that drive human behavior. As a result of not modeling deep human emotional, cognitive, and social factors that determine social behavior outcomes, all three approaches lead to potential over-fitting on a single dimension of data.

Although frameworks like the Agent-Zero [4] and Homo Socialis [5] offer theoretical solutions to modeling the true complexity of human dynamics driven by deep neurocognitive underpinnings that are at the core of any human social activity, including the spread and evolution of information, these deep models are limited to modeling conceptual problems. Furthermore, simpler models are preferred for real-world problems as they allow for the parameters tuning directly associated with modeling a particular dataset, simpler models cannot simultaneously replicate multiple complex phenomena such as various aspects of human dynamics, including information cascading, gatekeeper’s identification, information evolution, and persistent minorities.

To overcome the shortages in modeling the online social platforms, the Defense Advanced Research Projects Agency (DARPA) announced the “Computational Simulation of Online Social Behavior (SocialSim)” program to develop innovative technologies for high-fidelity computational simulation of online social behavior. Responding to this DARPA program, our team proposed and implemented a novel simulation framework that enables revolutionary advances in simulation of information spread and evolution on social media on a large scale. Our team accomplished this by (I) modeling social dynamics using a network of computational agents endowed with deep neurocognitive capabilities, (II) creating a family of plausible social dynamic models assembled from modularized sub-components, and (III) utilizing machine Learning algorithms and HPC Cloud Computing for model discovery, refinement, and testing.

This paper explains the design and concepts related to our framework and agent-based models for social network analysis using large volumes of data from GitHub, Reddit, and Twitter with the aim of better understanding of online social behaviors. We proposed the Deep Agent Framework (DAF), which operationalizes social theories of human behavior and social media into optimized generative simulation capabilities. Additionally, we developed Multiplexity-Based Model (MBM) which is an agent-based model designed based on concepts from graph theory, that simulates online social network evolution.

2 Challenge Problem Description

The challenge problem was designed to develop a multi-resolution simulation at the content, user, community, and population levels. Thus, a total of 57 accuracy measurements and metrics were designed to evaluate the participant models. The challenge evaluation procedure applied a combination of various metrics and measurements over four dimensions: accuracy, generalizability, explainability, and experimental power. Table 2 in the Appendix section contains the evaluation metrics and a performance comparison between the agent-based models for the community, content, population, and user-level interactions.

3 Methodology

Our generative model of social media (Fig. 1a) employs ABMs to simulate social dynamics via embedded agents as user profiles in OSN platforms, and “deep agents” as the platform users. The deep agent concept is adopted from the Agent-Zero [4] and the Homos Socialis [5] frameworks, which leads to deep agent modeling. Considering the Agent Zero framework, a cognitively plausible agent must account for three dimensions: Emotions (leading to ortho-rational behavior), Bounded Rationality, and Social Connectivity. Therefore, the social media users in our DAF framework were modeled to possess these three dimensions, and are referred to as the deep agents in this paper. In contrast, “shallow” models are deficient in modeling deep human characteristics that would determine social behavior outcomes. Additionally, shallow models focus on fitting equations into a single phenomenon of interest, which results in models that are brittle and potentially over-fitted for a single dimension of the data.

The agents’ interactions according to the embedded rules result in specific outcomes at the population, community, user, and content levels, and provide information regarding the agents’ decision processes. The agents’ relative actions are derivable by applying particular metrics, (addressed in Sect. 4.1) and the Appendix sections, which provide simple statistics for the cascade and group behaviors.

The designed Deep Agent Modeling Framework (DAF), depicted in Fig. 1b, allows to create a family of modular sub-components from which multiple plausible models can be systematically assembled, tested, and validated. Discovering every rule of behavior is possible through employing the genetic programming evolutionary model discovery method (red box in Fig. 1a) as in [6], which explores every possible space related to the set of agents’ rules of behavior. This provides strong inferences of human behavior using computational simulations. The search can be guided by model accuracy, as measured by comparing model outputs with real-world social dynamics data. Assembling the framework with the explained pieces, this framework unleashes the power of combining massively parallel computing, data analytics of large datasets and, machine learning to assist in mixing and matching sub-models in a semi-automated way; this allows for exploration and testing that validates tens of thousands of models against a large set of target behaviors.

We developed the DAF simulation tool to help answering the questions related to properties of information exchange dynamics on online social media in population, content, user, and community levels. Additionally, we proposed the Multiplexity-Based Model (MBM), which captures social network evolution based on the preferential attachment, attention, and recency cognitive bias. We mix and compare our model with other theory-driven models designed by our team of researchers. These models include Multi-Action Cascade Model (MACM) and Sampled Historical Data (SHD). The mixing, evaluation, and comparisons are provided for the three models and their variations: MBM-Influenced Filtered Network, MBM, SHD, MACM-Influenced Filtered Network, and finally MACM. In this paper, we applied hand-crafted behavioral theories and data-driven models directly to feed the agent-based models.

3.1 Deep Agent Framework: Architecture and Analysis

The architecture of the framework (Fig. 1b) consists of the data pre-processing, and the modeling and output phases. The data pre-processing phase includes data sampling, extracting influential users through normalized transfer entropy, extracting external shocks to the system represented as outliers, and initializing the endogenous and exogenous influences with the extracted users and shocks. The endogenous influence initialization involves the snowball sampling of the influential users and their relationships, to generate the static endogenous network, the network dynamics, and the network message information. Snowball sampling with normalized transfer entropy was used to extract the influential relationships from the event data, which was used beside the extracted activity disparity distributions of the endogenous relationships to build the static exogenous network. Using the extracted influential relationships, we used the most recent activities to build the initialized network dynamics and the last m messages to filter the network message information.

Each model has two variations, a full network simulation and an Influence Filtered Network (IFN) simulation. The full models take in the entire network of the event data in the initialization phase; however, the IFN models are initialized using only the filtered influential users to simulate the user interactions. We used all three outputs of the endogenous influence initialization phase as the inputs to the MACM and MACM-IFN models.

For the exogenous data initialization process, we extracted the outliers from the exogenous training and challenge data sources using different filtering methods. This process includes applying Fourier Transform (FT) on each different time-series, employing a moving window Magnitude filter and Butterworth filter on the FT of each different time-series to filter the anomalies, and the binary digitization of the anomaly time-series. We applied transfer entropy to the endogenous outliers and extracted the activity disparity distributions of the exogenous relationships from the filtered data to build the static exogenous network. Finally, we generated the network dynamics via extracting the last activity disparity of the exogenous relationships from the exogenous outliers. The generated static exogenous network and the network dynamics were fed into MACM and MACM-IFN models. The inputs to the MBM and MBM-IFN models were the entire network of the event data and the sampled data related to the last x weeks, respectively.

The modeling and output phase contains our models’ simulations and mixing of the models, model evaluation, and model tuning. The five variations of the generative models were implemented using Netlogo and RHPC coding environments. The model mixing strategy refers to merging the simulation outputs of the MACM and MBM models with the simulation output from the SHD model. The mixing strategy combines the output of the Full models with 10% of the simulated user interactions using the SHD model. The IFN models take 90% of the simulated interactions using the SHD model as input combined with the model results. More information on mixing strategy is provided in Sect. 3.2.3.

3.2 Agent-Based Models

The agent-based models in this paper are generative rule-driven models, designed based on the social theory on Diffusion of Information (DoI) and the user actions in OSNs. Although each social media platform has specific user interactions, we can refer to four fundamental user activities observable in any OSN platform: Create, Post, Vote, and Follow. Along with this, there are four entities in any OSN environment: Actor, Content, Action, and Space. This viewpoint of the actions and entities allows behavior to be represented across multiple social media environments, referred to as the common language or the ontology of user actions and entities. The agent-based models in this paper are designed based on the four traditional DoI models: the Threshold Model (TM) [7,8,9]; the Independent Cascade Model (ICM) [10, 11], the Bass Growth Model [7, 12], Rand Agent-Based Model [13], and the Complex Contagion Model [14].

Table 1 Table of symbols and definitions used in this paper

Full size table

3.2.1 Multiplexity-Based Model

The MBM model simulates social network evolution by multiplex networks, which have multi-layer network structure with possible shared nodes among different layers [15]. As MBM is designed based on concepts from graph theory, we refer to OSN users as nodes and user interactions as links. The model consists of a directed bipartite graph with bipartite pairs of users-repositories for GitHub, users-subReddits for Reddit, and users-users for Twitter, distinguished by multiple layers. Each of the separate user actions in the platform generates a sub-graph and the combination of the actions generates the whole network structure. The set of user actions in this model are conversation creation, contribution, vote , and follow, which can be formalized as $(C_i \notin \{C\})$, $(C_i = C_j , M_i \notin \{M\})$, $(V_i \notin \{V\})$, and $(L_i \notin \{L_{U_j}\})$ respectively, where indices are representative of users that perform the action, and $\{C\}$, $\{V\}$, and $\{L_{U_j}\}$ refer to the sets of all conversations, votes, and links to followers of the user in the model up to the current time-step.

The cognitive factor of MBM refers to the information overload resulting in higher attention to recent activities and active users. In other words, MBM considers the recency bias affecting OSN users’ decision-making processes to possible propagation of information. This concept has been designed in the model in terms of age and fitness values, such that the user’s influence decays in time. Content targets that have been recently the object of actions, and the users that have recently acted, see their fitness decrease the least, whereas these values for inactive users are decreased the most. This results in paying higher attention to the influential users and targets, but allowing their fitness values to be reduced in popularity over time, and eventually to be supplanted by newer elements. Reaching a certain age leads to the node removal from the model node-set. As a result, the model’s predictions are most affected by recent trending activities with higher attention to more active users. Accordingly, the driving forces of MBM are preferential attachment and preferential decay, both as functions of the node’s degree $k \in \{1, \dots , K\}$ and age $a \in \{t_0, \dots , t_{max}\}$. The propagation of information from user $U_j$ to neighboring user $U_i$ can be represented as

$$\begin{aligned} (U_j, A_j, a_j) \rightarrow (U_i, A_i, a_i) \end{aligned}$$

(1)

The model initially comprises |U| nodes, with each node as a user U having $L_U$ number of links. Each node in the graph is assigned a string fitness of $F = 1$ that models the node’s influence on the growth of the network and decreases as a function of time, node age, and activity history [16]. Accordingly, node’s age value can be calculated as

$$\begin{aligned} a_j \leftarrow a_j + \Big (1- (t_{c_j} - t_{p_j}) * F_j)\Big ) \end{aligned}$$

(2)

where $t_{c_j}$ is the current time and $t_{p_j}$ refers to the previous activity time for user $U_j$, and $F_j$ is user’s current fitness value. The value of fitness for each node can be calculated as

$$\begin{aligned} F_j = \frac{|A_j|}{a_j} \end{aligned}$$

(3)

where $|A_j|$ is the number of actions for user $U_j$ and is equal to the number of user interactions or degree $k_j$. In other words, fitness is essentially a simulation of a user’s productivity. User’s fitness is increased by the diversity of activities, the shortness of time-span between its activities, and the fitness of all the interactions a user builds.

MBM network grows at each time-step by the successive addition of new nodes to the model node-set and new edges to the edge-set. Node addition ratio was extracted from the input data. New links emerge between the nodes with higher fitness values as a result of the preferential attachment.

In summary, the model can be broken down into three steps that are performed at each time-step: (I) Node selection: selecting a set of nodes from the set of all current model node-set and potential nodes that can be added to this set, referred to here as $\{U_t\} \subset \{U\}$. (II) Interaction: building the interactions between the bipartite node pairs in $\{U_t\}$ and the rest of nodes in the model such that the pairs are matched according to a likelihood distribution weighted on the nodes’ fitness values. A sub-graph associated with a behavior activity is assigned to the selected pair based on a likelihood distribution determined by the popularity of action types. The layer fitness score for each sub-graph gets re-calculated to predict how popular each action remains. (III) Update: updating the node and link fitness scores, local and global degree values, layer fitness scores, node and link ages, and removing the nodes and edges from the model according to fitness decay. In this step, the new age value for each node can be calculated by

$$\begin{aligned} a_j \leftarrow a_j + \Big (1- (t_{c_j}-t_{p_j}) * (t_{c_j} + 1)\Big ) \end{aligned}$$

(4)

3.2.2 Multi-action Cascade Model

The MACM model [17, 18] is a cognition-based agent-based model that simulates the diffusion of information through the network using individual-scale probabilities of actions derived from the Independent Cascade Model. The cognitive factor of this model refers to the information overload resulted by vast amount of social media activity bombarding users’ attention and affecting their decision-making processes through prioritizing and preference to possible propagation of information.

Using transfer entropy analysis on user-user and user-exogenous force influences, this model measures the probabilities of actions for user $U_i$ related to endogenous and exogenous forces as

$$\begin{aligned} q = \mathbb {P}_t(U_i | U_j) = \mathbb {P}_{t - 1}(U_i | U_j) + \frac{\varepsilon _{i, j}}{1 + T_{i, j}}\; \end{aligned}$$

(5)

$$\begin{aligned} p = \mathbb {P}_t(U_i | S) = \underset{s \in S}{\bigcup } \Big ( \mathbb {P}_{t-1}(U_i | s) + \frac{\varepsilon _{i, s}}{1 + T_{i, s}} \Big ) \end{aligned}$$

(6)

where neighboring user $U_j$ is active in a conversation, $s \in S$ refers to external shock, $T_{i, j}$ is the transfer entropy from user $U_j$’s action to user $U_i$’s action, $T_{i, s}$ is the transfer entropy from external shock s to user $U_i$’s action, and $\varepsilon $ indicates noise relative to activity changes of the two users.

MACM considers the evidence that the internal and the external forces can define the dynamics of different event types causing the spread of information, and the evolution of a content as it spreads through the social network. The user actions in this model are conversation creation, contribution, sharing, and deletion, which can be formalized as $(C_i \notin \{C\})$, $(C_i = C_j , M_i \notin \{M\})$, $(C_i != C_j , M_i = M_j)$, and $(C_i = C_j , M_i = \varnothing )$, respectively, where indices are representative of users that perform the action, and $\{C\}$ and $\{M\}$ refer to the sets of all conversations and contents in the model up to the current time-step. The propagation of information is modeled as a message considering the influences from the neighboring nodes, the action type, the target conversation, and the content genome, such that

$$\begin{aligned} (U_j, A_j, C_j, M_j) \rightarrow (U_i, A_i, C_i, M_i) \end{aligned}$$

(7)

where $Cont_A$ is the massage content in conversation C, A refers to the action, and U represents the user. Additionally, the user interactions are conditioned on the processing of the received messages from the connected nodes filtered down as a result of the cognitive overloading. Accordingly, MACM agents calculate the probability of performing action $A_i$ in response to action $A_j$ as the union of probabilities of actions based on the endogenous and exogenous forces as

$$\begin{aligned} \mathbb {P}_t(U_i | U_j, S) = \mathbb {P}_t(U_i | U_j) \cup \mathbb {P}_t(U_i | S) \end{aligned}$$

(8)

3.2.3 Sampled Historical Data

The SHD model [19, 20] is a replay-based data mixture model designed based on the seasonality characteristic of the OSN user activities and the hypothesis that the users exhibit repetitive patterns. This model extracts the most recent activities from the training data to provide the information related to the user interactions and edge formations in the network, and predicts the future user interactions according to the same types of activities in the past. We employed the SHD model to simulate the less-active users that hold little influence on the network and have been removed in the filtering processes. The mixing strategy using the SHD model refers to the following: I) extract the active and less-active unique users from the social network data; II) predict the activities associated with the active users using MBM and MACM models; III) predict the activities associated with the less-active users using SHD; and IV) append the SHD simulated low-activity users to the events simulated by the MBM and MACM models.

4 Dataset Description

The challenge goal was to model social structures and their day-to-day changes, and accordingly, simulate the time-series network evolution of GitHub, Twitter, and Reddit social environments for the three domains of interest (CVEs, Cyber Threats, and Cryptocurrencies). The datasets for the challenge were provided by Leidos, and are explained in detail below.

The GitHub social network data contained information from the years 2015 to 2017. A total of 33,570 cryptocurrency-related repositories were associated with or included target coin names or keywords in their descriptions, and 1,193,370 events matched with those repositories. 5,505,496 cybersecurity repositories and 214,074,771 events were selected as well as 186,190 software vulnerability-related repositories and 26,777,997 events. The Twitter social network included data for the years 2016 to 2017, and with a total of 7,382,724 cryptocurrency-related tweets, retweets, and quotes. These values were 30,704,025 and 74,074 for the cybersecurity and software vulnerability domains, respectively. The Reddit dataset included submissions and comments matching keywords for the years 2015–2017. The cryptocurrency-related data contained 299,401 submissions and 3,370,547 comments. These values were 2,442,942 and 33,629,588 for the cybersecurity domain, and 60,760 and 264,024 for the software vulnerability domain.

4.1 Evaluation Events and Metrics

Since each social environment has a unique set of events depending on the interactions in the platform, the evaluation events for each social environment were defined separately as the following: (I) GitHub: Commit Comment, Create, Delete, Fork, Issue Comment, Push, Watch, Pull Request, and Issues, (II) Twitter: Tweet (create original material), Retweet, Reply, and Quote, (III) Reddit: Comment and Post.

The evaluation measurements were applied to online social behaviors at the population, content, user, and community levels. Content-based measurements are any user interaction, that is, posting or replying in Reddit and Twitter platforms, or writing a comment in Github. User-level measurements were focused on user activities; for instance, the contribution counts of the user over time. Finally, the population-level measurements aggregated the events’ and users’ characteristics on a particular platform. Examples of the accuracy measurements and metrics include community burstiness and user Gini coefficient calculated by absolute percentage error, community Gini measured by absolute difference, user trustingness, user diffusion delay calculated by the Kolmogorov-Smirnov test (K-S test), and user popularity measured by Ranked-Biased Overlap (RBO). Additionally, “surprise measurements” were provided during the test event for competing campaigns in the cryptocurrency domain and competing attention in the cyber threats and CVE domains.

5 Experimental Results

We applied extensive analysis using our framework to mix and match and compare the models. We also calculated model performances for the user, community, population, and content levels for the cybersecurity, software vulnerability, and cryptocurrency domains. The training data input for the agent-based models were GitHub, Twitter, and Reddit network evolution over time for the three domains of interest, and the exogenous data for both training and test periods. Simulation inputs include the initialization data, the exogenous data, and the events data. There were two kinds of expected simulation outputs: (I) full network dynamics (event logs/data frames obeying specific formats that contain event-related information), and (II) direct output of the accuracy measurements.

The corresponding results for the benchmark comparisons are provided in Fig. 2, which represents the comparison of the model measurements versus the ground truth (Random) considering the measurement types. The Jensen-Shannon Divergence, Kolmogorov-Smirnov test, and absolute percentage error metrics were among the metrics used for this analysis. We normalized the model output results over the measurement group (user, community, population, and content), metric type, and the platform (GitHub, Twitter, Reddit) to calculate a value in the range 0 to 1, where the lower value corresponds to better model performance.

Figure 3a, b demonstrates the model-wise performance comparisons with and without the SHD strategy mixing, and for the two user influence pruning cases. The prefix “p” refers to the models initialized using the data pruned for the influential users, while the non-prefixed models relate to the models initialized employing the entire event data. We can observe that pruning influential relationships improved the MBM model performance; however, this strategy was not successful with the MACM model. The results indicate that social theory-based modeling may consider influential interactions inherently, and support that the more “explainable” a model, the higher the performance. Additionally, the results provide evidence that the mixing strategy helps improve the user and community level performances, and the single models are more successful in modeling the population and content level interactions of the influential users.

Table 2 Performance comparison for the agent-based models across community, content, population, and user-level metrics

Full size table

The metric-based performance comparison of the models is provided in Table 2 in the Appendix section, for the community, content, population, and user-level interactions, respectively. In these figures, the rows refer to the group-based measurement metrics for the models in rows. Each occupied cell indicates the best performance of the relative model for the specific metric. The values refer to the normalized sub-metrics averaged over 105 model runs. The content and user-level performances illustrate that the mixing strategy using the Standard Historical Data (SHD) improves the model performance in modeling the node level interactions. Finally, the population-level scores illustrate another benefit of the SHD mixing strategy in improving the performance for the degree distributions and the node level characteristics explanations.

6 Conclusion and Future Work

In this paper, we discussed how user interactions, behaviors, and complex human dynamics can be captured via combining massively parallel computing, data analytics of large datasets, and machine learning algorithms. We proposed the Deep Agent Framework (DAF), which operates beyond single models by mixing and matching sub-models in a semi-automated way. Our framework operationalizes social theories of human behavior and social media into optimized generative simulation capabilities that enable exploring information diffusion and evolution within the social media context. Our multi-resolution simulation at the user, community, population, and content levels, and our extensive analysis and results provide evidence that our framework is a powerful tool in modeling the diffusion and evolution of the information in variety of online social platforms. Although we applied hand-crafted behavioral theories and data-driven models directly to feed the agent-based models without employing the genetic program (red box in Fig. 1a), our results prove that our framework and our deep generative models are powerful in modeling online social network interactions.

Further improvements to the DAF framework can be made by employing the evolutionary model discovery to explore the space for the rule sets of behaviors related to the agents, which allows for the testing and validation of tens of thousands of models against large set of target behaviors as in [6]. Additionally, the future direction of our work serves to automatically introduce variants to all the models produced by different performers to obtain the overall best model.

References

Adamic, L.A., Lento, T.M., Adar, E., Ng, P.C.: Information evolution in social networks. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 473–482. ACM (2016)
Google Scholar
Rand, W., Herrmann, J., Schein, B., Vodopivec, N.: An agent-based model of urgent diffusion in social media. J. Artif. Soc. Soc. Simul. 18(2), 1 (2015)
Article Google Scholar
Milovan Šuvakov, D.G., Schweitzer, F., Tadić, B.: Agent-based simulations of emotion spreading in online social networks. arXiv preprint arXiv:1205.6278 (2012)
Epstein, J.M.: Agent$\_$Zero: Toward Neurocognitive Foundations for Generative Social Science, vol. 25. Princeton University Press (2014)
Google Scholar
Gintis, H., Helbing, D., et al.: Homo socialis: an analytical core for sociological theory. Rev. Behav. Econ. 2(1–2), 1–59 (2015)
Google Scholar
Chathika, G., Ivan, G.: Alternate social theory discovery using genetic programming: towards better understanding the artificial Anasazi. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 115–122. ACM (2017)
Google Scholar
Bass, F.M.: A new product growth for model consumer durables. Manage. Sci. 50(12$\_$supplement), 1825–1832 (2004)
Google Scholar
Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83(6), 1420–1443 (1978)
Article Google Scholar
Watts, D.J.: A simple model of global cascades on random networks. Proc. Natl. Acad. Sci. 99(9), 5766–5771 (2002)
Google Scholar
Goldenberg, J., Libai, B., Muller, E.: Talk of the network: a complex systems look at the underlying process of word-of-mouth. Mark. Lett. 12(3), 211–223 (2001)
Article Google Scholar
Goldenberg, J., Libai, B., Muller, E.: Using complex systems analysis to advance marketing theory development: modeling heterogeneity effects on new product growth through stochastic cellular automata. Acad. Mark. Sci. Rev. 9(3), 1–18 (2001)
Google Scholar
Bass, F.M.: A new product growth for model consumer durables. Manag. Sci. 15(5), 215–227 (1969)
Google Scholar
Rand, W., Rust, R.T.: Agent-based modeling in marketing: guidelines for rigor. Int. J. Res. Market. 28(3), 181–193 (2011)
Google Scholar
Centola, D., Macy, M.: Complex contagions and the weakness of long ties. Am. J. Sociol. 113(3), 702–734 (2007)
Article Google Scholar
Basu, P., Dippel, M., Sundaram, R.: Multiplex networks: a generative model and algorithmic complexity. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 456–463. IEEE (2015)
Google Scholar
McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)
Google Scholar
Gunaratne, C., Senevirathna, C., Jayalath, C., Baral, N., Rand, W., Garibay, I.: A multi-action cascade model of conversation. In: 5th International Conference on Computational Social Science. http://app.ic2s2.org/app/sessions/9kXqn5btgKKC5yfCvg/details (2019)
Gunaratne, C., Baral, N., Rand, W., Garibay, I., Jayalath, C., Senevirathna, C.: The effects of information overload on online conversation dynamics. Comput. Math. Organ. Theory (2020)
Google Scholar
Bidoki, N.H., Schiappa, M., Sukthankar, G., Garibay, I.: Predicting social network evolution from community data partitions. In: 2019 International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation (2019) (In–press)
Google Scholar
Saadat, S., Gunaratne, C., Baral, N., Sukthankar, G., Garibay, I.: Initializing agent-based models with clustering archetypes. In: International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, pp 233–239. Springer (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by the Defense Advanced Research Projects Agency (DARPA) under grant number FA8650-18-C-7823. The views and opinions expressed in this article are the authors’ own and should not be construed as official or as reflecting the views of the University of Central Florida, DARPA, or the U.S. Department of Defense.

Author information

Authors and Affiliations

University of Central Florida, Orlando, USA
Ivan Garibay, Toktam A. Oghaz, Niloofar Yousefi, Ece Çiğdem Mutlu, Madeline Schiappa, Steven Scheinert, Christina Bouwens, Stephen M. Fiore, Alexander Mantzaris, Anastasia Salter, Mel Stanfill, Gita Sukthankar, Nisha Baral, Chathika Gunaratne, Neda B. Hajiakhoond, Jasser Jasser, Chathura Jayalath, Olivia B. Newton, Samaneh Saadat, Chathurani Senevirathna & Rachel Winter
Florida Institute of Technology, Melbourne, USA
Georgios C. Anagnostopoulos & Xi Zhang
University of Chicago, Chicago, USA
John T. Murphy
North Carolina State University, Raleigh, USA
William Rand
University of North Carolina at Charlotte, Charlotte, USA
Gabriel Fair

Authors

Ivan Garibay
View author publications
You can also search for this author in PubMed Google Scholar
Toktam A. Oghaz
View author publications
You can also search for this author in PubMed Google Scholar
Niloofar Yousefi
View author publications
You can also search for this author in PubMed Google Scholar
Ece Çiğdem Mutlu
View author publications
You can also search for this author in PubMed Google Scholar
Madeline Schiappa
View author publications
You can also search for this author in PubMed Google Scholar
Steven Scheinert
View author publications
You can also search for this author in PubMed Google Scholar
Georgios C. Anagnostopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Christina Bouwens
View author publications
You can also search for this author in PubMed Google Scholar
Stephen M. Fiore
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Mantzaris
View author publications
You can also search for this author in PubMed Google Scholar
John T. Murphy
View author publications
You can also search for this author in PubMed Google Scholar
William Rand
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Salter
View author publications
You can also search for this author in PubMed Google Scholar
Mel Stanfill
View author publications
You can also search for this author in PubMed Google Scholar
Gita Sukthankar
View author publications
You can also search for this author in PubMed Google Scholar
Nisha Baral
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Fair
View author publications
You can also search for this author in PubMed Google Scholar
Chathika Gunaratne
View author publications
You can also search for this author in PubMed Google Scholar
Neda B. Hajiakhoond
View author publications
You can also search for this author in PubMed Google Scholar
Jasser Jasser
View author publications
You can also search for this author in PubMed Google Scholar
Chathura Jayalath
View author publications
You can also search for this author in PubMed Google Scholar
Olivia B. Newton
View author publications
You can also search for this author in PubMed Google Scholar
Samaneh Saadat
View author publications
You can also search for this author in PubMed Google Scholar
Chathurani Senevirathna
View author publications
You can also search for this author in PubMed Google Scholar
Rachel Winter
View author publications
You can also search for this author in PubMed Google Scholar
Xi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Garibay .

Editor information

Editors and Affiliations

Computational Analytics, Claremont Graduate University, Claremont, CA, USA
Zining Yang
Computer Science, Elon University, Elon, NC, USA
Elizabeth von Briesen

Appendix

This section provides a list of measurement metrics across community, content, population, and user levels in Table 2. Each occupied cell indicates the relative performance of the model in terms of error (in columns) reported for a specified metric (in rows). Values in the cells refer to the error ratio assessed via first normalizing over each sub-metric, followed by averaging over the results for the model runs. In this table, a lower value corresponds to better model performance and the error ratios below 0.2 are marked in bold.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garibay, I. et al. (2021). Deep Agent: Studying the Dynamics of Information Spread and Evolution in Social Networks. In: Yang, Z., von Briesen, E. (eds) Proceedings of the 2019 International Conference of The Computational Social Science Society of the Americas. CSSSA 2020. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-030-77517-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-77517-9_11
Published: 03 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77516-2
Online ISBN: 978-3-030-77517-9
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics