Abstract
The paper studies the dynamics of the moods of Internet media users based on the Fokker-Planck equation and changes in the parameters of the graphs of networks of their comments. The article introduces the concept of the base state vector of the comment network graph, the elements of which are: the average value of the mediation coefficient, the average value of the clustering coefficient, the proportion of users in a particular state. The time dependence of the distance between the base vector and the current state vector forms a time series whose x-values can be thought of as a “wander point”. This time series is considered on the segment [Lmin, Lmax], where Lmin и Lmax – given boundaries of the interval of possible values of the network state. The current state of the comment graph can be determined using network analysis tools and text analysis methods. The solution of the Fokker-Planck equation made it possible to obtain an analytical time dependence of the probability density of detecting the value of the state of the network in one or another value x. Based on this approach, an algorithm can be created for predicting the time to reach a given state of the user comments network graph (the allowable distance between the base and observed vectors) for a given probability level (this allows time to be predicted). Analysis of the model showed its adequacy, not inconsistency.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Social networks
- Network graph
- Social network graph characteristics
- Nonlinear dynamics
- Fokker-Planck equation
- Social process modeling
- Monitoring
- Management
1 Relevance of the Study
Changes in the opinions and moods of social network users are a good indicator of the real processes that are taking place in society. This leads to a change in trends and in social processes this is latent. Moreover, it begins to occur in social networks much earlier than it becomes noticeable in non-network structures. To detect such phenomena, it is possible to analyze large volumes of textual information in real time, which is generated daily by users of social networks. This is a complex scientific and technical problem.
Social processes can be investigated using macrostatistics and sociological surveys. However, this approach, on the one hand, is very laborious (preparing special questionnaires with a large number of questions, forming a representative sample, conducting the survey itself, processing data, etc., which significantly reduces the reliability of the data obtained.
One of the possibilities for solving the problems of monitoring and predicting the development of public sentiment can be the study of the activity of users who leave comments on blogs and news resources on various socially significant topics.
Using text analytics tools that allow you to cluster texts on selected topics, and tools for collecting open data from social networks and news network resources, you can determine the mood of users and build graphs of their connections within selected thematic groups. Each such graph will have its own set of characteristics (network density, average intermediation coefficient, average clustering coefficient, elasticity coefficient, and others), which can change daily over time, thereby forming a multidimensional time series.
2 Setting a Study Objective
In the presented article, we propose a vector representation for describing the network of comments. The elements of the vectors are the admissible parameters of the network values (density, the average value of the intermediation coefficient, the average value of the clustering coefficient, elasticity, and others), as well as such characteristics as the share of users, which can be attributed to one of four groups based on text analysis:
-
loyalist (certainly supports the actions of the government and authorities).
-
oppositionist.
-
troll (a user using a resource to make a scandal, anger some and enjoy it).
-
undecided or neutral user.
Achievement or realization of desired or not desired states of the entire social network as a whole can be given on the basis of basic vectors (we will discuss this in the article below).
The time variation of the distance between the base vector and the current state vector can be considered as a “wander point” on the interval [Lmin, Lmax] or as a random (or almost random) time series. And some given value of this distance (the state in which management decisions should be made) can be considered as a trap or a point of an acceptable threshold for implementation, where a “wandering point” can eventually fall. This allows you to build probabilistic sociodynamic models to predict the dynamics of public sentiment.
In the traditional description of the behavior of a “wandering point”, as a rule, the diffusion model is used. However, in this case it cannot be considered reliable. As a rule, time series describing processes in complex systems (for example, financial indicators of stock and commodity exchanges) are not stationary, which is due to various reasons, including the presence of a human factor. Their selective distribution functions have a time-dependent mathematical expectation, which contradicts the simple diffusion model and shows a non-stationary time series.
In this regard, we are supposed to consider more complex models of behavior of the “wandering point”, for example, based on the Fokker-Planck equations.
3 Data Collection and Processing
As one of the examples of a study on the analysis of the structure of comment graphs, one of the news on the Echo-Moscow portal was chosen and all user comments on it and their available data were collected. After that, they (633 comments) were processed and, based on the analysis of texts, marked as belonging to one of four types: loyalist; oppositionist; troll; undefined.
The “undefined” group was singled out because, due to the small amount of text component of the comment, it was almost certainly impossible to say anything about the user’s affiliation with one of the other three groups. Some comments (77 pieces) at the time of data collection were deleted by the site moderators for rule violations, but due to the fact that the child comment retained the value of the parent comment, it became possible to restore information about their existence (but without texts). Therefore, the total number of nodes in the graph is 710. The stateful statistics of users are as follows: 10.28% loyalists (73 nodes); oppositionists 43.8% (311 knots); trolls 30.42% (216 knots); unspecified 4.65% (33 knots); 10.85% (77 nodes) deleted.
Figure 1 graphically shows us the structure of the graph obtained when processing comments on news. In color visualization of the obtained data, the nodes of the graph, depending on their state (assignments to one of four types), can be marked with different colors.
The links in Fig. 1 show the mutual commenting of users to each other. Thus, by the “color” of the nodes, one can judge their state, and by the edges of the graph about the interaction.
Figure 1 shows that there are many unrelated single vertices in this structure. How-ever, you can also notice the presence of a related component of the graph, which is separately shown in Fig. 2. Closed oval lines in Fig. 2 show users commenting on themselves.
Let’s consider the elements of the network state vector that we will use in our model:
-
The proportion of nodes that have a certain condition (for example, those who are negative towards any event in public life.
-
Clustering coefficient is a measure of the density of the connections of a given vertex with neighboring ones. The ratio of the real number of links that connect the nearest neighbors of a given node i to the maximum possible (such that all the nearest neighbors of a given node would be connected directly to each other) is called the node clustering coefficient, its value lies on the segment [0, 1]. The larger its value, the more significant this node is in the exchange of information.
-
Coefficient of mediation - shows the ratio of the number of shortest paths between all pairs of network nodes passing through this node to the total number of all shortest paths in the network, its value lies on the segment [0, 1]. The larger its value, the more significant the role of this vertex in the exchange of information.
Determine the value of the elements of the basic network state vector (let’s designate it as θ). They set thresholds, the transition of which is undesirable from the point of view of state management. Given that in any community there is always a 0.10 to 0.15 share of participants always disagreeing on any issue, we will accept the share of those who are negative towards the event in question equal to 0.12.
The desired average value (for all nodes) of the clustering coefficient of such a network is also assumed to be small, for example, equal to 0.05; and the average degree of mediation of nodes in such a network is also equal to 0.05. Thus, the base vector will be: \(q = \left( {0.12;0.05;0.05} \right)\).
Note that the number of parameters with which you can describe the state of the network can be greater, and we have chosen only those that in our opinion are the most significant. In addition, the selected parameters are normalized (lie on the line [0, 1], therefore, they equally affect the calculation of the distance metric.
In our proposed approach, various columns of commenting on news on a selected resource on certain topics during the day can be combined into a single structure through connections between nodes that belong to users. Thus, we can highlight a large graph that will describe the activity of users of this network information resource during the day. Next, you can define the elements of the current state vector that describes its characteristics.
Changes in the components of this vector for each day for a certain time will form a multidimensional time series.
4 Theoretical Part
The resulting multidimensional time series can be used to describe the dynamics of the change in public sentiments of users of the intern network.
In the course of the study, articles were studied, on the basis of which it can be concluded that social network analysis methods are a useful tool for creating a complete picture of public sentiment at a time when events of a certain nature occur in a country or in a country. World. We can consider a number of works close to the topic of our study on the description of processes in complex social network structures.
Since finding similarities between nodes in a network is a time-consuming process as the network grows, researchers have used swarm algorithms to optimize the process of solving link prediction and community discovery problems [1]. Swarm-based optimization techniques used in social network analysis are compared in this article with community and link analysis based on traditionally used approaches.
In works [2,3,4] proposed the KroMFac technique, which conducts community detection using regulated non-negative matrix factorization (NMF) based on the Kronecker graph model. KroMFac combines network analysis and community discovery techniques in a single, unified framework. This technique links four areas of research, namely the detection of communities on graphs, the detection of overlapping communities, the detection of communities in incomplete networks with missing edges and complete networks.
In the work [5] proposed a new weighted summary measure for detecting influential users in social networks. This method combines the influence of several structural features of the network, as well as local and global information to obtain an estimate of weighted total centrality.
The authors of [6] propose a new index for analyzing the distribution of messages in social networks, based on the topological nature of networks and the strength of messages’ influence. This indicator characterizes the strength of each node as a means of launching a message, dividing nodes into starters and non-starters.
Works [7,8,9] on the analysis of random networks presents physically justified models and effective algorithms for determining hierarchical ranks of nodes in directed networks.
The dynamics of changes in the public mood of Internet users can be attributed to stochastic processes. The presence of the human factor (many people with different opinions, preferences and behaviors) on the one hand creates a randomness of changes (due to the wide variety of user behavioral models), and on the other hand introduces elements of purposefulness into the dynamics of changes. A detailed description of the use of stochastic methods for modeling the dynamics of social processes can be found in [10].
The most promising in our opinion for creating models of the dynamics of change in public mood are models that can be created from the Fokker-Planck equation, which takes into account both ordered and random changes.
The Fokker-Planck equation is widely used to analyze and model the behavior of time series when describing processes in complex systems [11,12,13,14].
It should be noted that in addition to the Fokker-Planck equation, other approaches are used for modeling based on differential equations, for example, the Liouville equations [14, 15], the diffusion equations [13, 16] and several others.
To simulate social processes, not only models based on partial differential equations are used, but, for example, models based on game theoretic approaches and methods for making managerial decisions based on them [17].
The Fokker-Planck equation is widely used to analyze and model transients observed in various complex systems and provides good agreement with predicted behavior and observed data. Therefore, as a hypothesis, we will assume that the Fokker-Planck equation can be used to analyze and model the appearance of comments on news and blogs. The Fokker-Planck equation has the form:
where \(\rho \left( {x,t} \right)\) - time-dependent t probability density of state distribution x (in our case, state x is the number of comments observed at time \(t\)), \(D\left( x \right)\) – state-dependent x factor determining random state change x, \(\mu \left( x \right)\) – a state-dependent x coefficient defining a targeted state change \(x\).
Applicable to our model \(D\left( x \right)\) can be interpreted as user actions caused by a spontaneous impulse that arose when reading the news or other users’ comments on it, when the event described in the news or blog is not significantly important, but the user is ready to spend time commenting or responding to another commentator (the user had a spontaneous desire to respond to this news). And \(\mu \left( x \right)\) can be interpreted as targeted actions caused by the desire to respond to a significant news or blog for the user, as well as comment on the comment of another user if he touched on a topic important from the point of view of this user (the user is constantly interested in this topic).
Next, when you need to build a model, you need to make assumptions about the dependence \(D\left( x \right)\) and \(\mu \left( x \right)\) from state x and consider two conditions. First, we take into account the dimension of the terms included in Eq. (1), and secondly, we can make the assumption that with an increase in the state of x (an increase in the number of possible comments (the significance of the news or blog) of magnitude \(D\left( x \right)\) and \(\mu \left( x \right)\) should also increase).
Logic dictates that all terms of Eq. (1) must have the same dimension, which has \(\rho \left( x \right)\). Both the first and second condition will be met if the dependencies \(D\left( x \right)\) and \(\mu \left( x \right)\) from state x will have the form: \(\mu \left( x \right) = \mu_0 \cdot x\) and \(D\left( x \right) = D_0 \cdot x^2\). In this form, on the one hand, growth is ensured \(D\left( x \right)\) and \(\mu \left( x \right)\) increasing the state of \(x\), and on the other hand, the condition of maintaining the dimension will be met.
Solving the stationary Fokker-Planck equation:
Under the assumptions made has the form:
This is the power law of distribution of commentators by the number of comments observed in practice. Thus, this suggests that the Fokker-Planck equation can be used in practice to describe social processes.
To describe the change in the value of the distance between the magnitude of the current state vector and the given base vector over time, consider the solution of the unsteady Fokker-Planck equation, which may allow the construction of probabilistic sociodynamic models to predict the dynamics of public sentiment.
Let us formulate a boundary value problem, the solution of which will describe the process of changing the value of the distance between the value of the current state vector of the comment network graph and the given base vector in time.
The first boundary condition:
When selecting the first boundary condition, we will proceed from the following considerations: \(x = L_{min}\) (the left border of a segment of possible states) determines the state through which the transition must be avoided (the area located on the segment to the left of this state is undesirable for us). The probability of detecting such a state of the system may not be zero. And the probability density, which determines the flow in the state \(x = L_{min}\), must be taken equal to 0, since the states should not go beyond this border (here the reflection condition is implemented). Thus:
Second boundary condition:
We limit the area of possible states on the right to some value \(x = L_{max}\) (the metric used in the calculations cannot be greater than the magnitude of the vector whose elements have maximum values in the space of the selected coordinates). The probability of detecting such a state over time will be different from zero. However, the probability density determining the flow in the state \(x = L_{max}\), must be set to zero (the distance between the current and base vector of states is limited by the maximum values of possible coordinates in the vector space used (the reflection condition from the boundary is realized)):
To formulate the boundary value problem, it is necessary to specify the initial condition. Since at a point in time \(t = 0\) system state (the distance between the base vector and the current state vector can be equal to some value \(x_0\), then the initial condition can be set as:
The presence of a delta function leads to the fact that the solution of Eq. (1) under given boundary conditions and the assumptions made about \(D\left( x \right)\) and \(\mu \left( x \right)\) for time-dependent probability density of system state detection in one or another value x will be:
At equation \({\text{L}}_{min} \le x \le x_0\):
At equation \(x_0 \le x \le L_{max}\):
where \(\alpha = \frac{1}{2} - \frac{\mu_0 }{{D_0 }},\)\(\varphi \left( {x,t} \right) = { }2\frac{{x_0^\alpha \cdot x^{ - \left[ {1 + \alpha } \right]} \cdot e^{ - \frac{D_0 \alpha^2 }{2}t} }}{{\ln \left( {\frac{{L_{max} }}{{L_{min} }}} \right)}},\)\(\omega \left( {n, t} \right) = \frac{{\pi^2 { }n^2 D_0 t}}{{2\left[ {\ln \left( {\frac{{L_{max} }}{{L_{min} }}} \right)} \right]^2 }}\)
Probability that by the time t the state of the system will be within a period from \(L_{min}\) to \(L_{max}\), that is threshold state \((\theta )\) will not be reached can be calculated as follows:
Probability \(Q(\theta ,t)\) that the threshold state is \(\theta\) by the moment of time \(t\) will be achieved or exceeded, calculated by the formula:
Defining the line boundaries of possible states from \(L_{min}\) to \(L_{max}\) we will discuss in the analysis section of the resulting model.
5 Prediction Algorithm for Achieving a Given State of the Network Comment Graph of Masmedia News Users
Predicting the dynamics of the moods of Internet media users based on the Fokker-Planck equation and changing the parameters of their commentary networks can be carried out according to the following algorithm:
-
You need to collect text comments and metadata of users on a specific topic from online news media resources with date and time binding.
-
Then you need to process data using text analytics and sentimental analysis, get a graph of user comments on a certain topic and calculate its characteristics (network density, average mediation coefficient, average clustering coefficient, elasticity, share of users with one or another mood).
-
Next, you need to set the values of the elements of the base vector, which will determine the achievement of the desired or not desired state \((\theta )\) and form, based on the processed data and the given vector, a time series of changes in the graph of user comments on a certain topic over time
-
Then we set the duration of the step \(\tau\) (hour, day, week, etc.) and by time series values in a few steps for a given \(\tau\) we determine using numerical calculations using observed data and Eqs. (9) and (10) model parameters \(\mu_0\) and \(D_0\).
-
We assume the last average value of the distance metric between the base vector and the vector of the current state of the network as the initial state \(x_0\) and using the obtained values \(\mu_0\) and \(D_0\), as well as Eqs. (9) and (10) perform calculations, and obtain a dependence on the time of probability of reaching, desired or not desired state. Next, you can set the probability value (for example, 0.95) and estimate the time to reach a given probability level (make a forecast by time). Analysis of the obtained model
For the graph shown in Fig. 2, you can determine its characteristics and elements of the vector of the current state at a given time t (which is taken as \(t = 0\)):\(X\left( t \right) = \left( {0.44;0.11;0.15} \right)\). Distance between the specified base vector of the desired state \(\theta = \left( {0.12; \, 0.05; \, 0.05} \right)\) and the current state vector \(X\left( t \right)\) at time \(t = 0\) will be equal to \(x_0 = 0.34\). By analyzing the dynamics of the time series of changes in the state of the network over the previous few days and using the equations of the model, you can solve the opposite problem and determine the values of the model parameters \(\mu_0\) and \(D_0\). In our case \(\mu_0 = 0.0003\) . and \(D_0 = 0.007\).
Right boundary of a line of possible states Lmax can be specified as the distance between the base vector \((\theta )\) and vector of maximum possible values of network parameters \(X\left( t \right) = \left( {1; 1; 1} \right)\). in the case under consideration \(L_{max} = 1.61\). The left boundary for insurance can be defined, for example, as half the length of a given base vector (in this case \(|\theta | = 0.14\)), thus \(L{}_{min}\) will be equal to 0.07.
As the results showed, if the network is not affected, then under current conditions the required state can be achieved with a probability of 0.8 for 375 days, and with a probability of 0.9 for 525 days. The result obtained is quite possible, but the question of assessing its accuracy remains, which requires additional research.
Figure 3 shows the results of the simulation as a function of the probability time of reaching a given threshold network state.
6 Conclusion
In conclusion, we note that the complex nature of process dynamics in complex social systems can be described not only on the basis of models created on the basis of the Fokker-Planck equation. For example, in [18,19,20], models are presented for describing the stochastic dynamics of changes in the state of complex social systems, taking into account the processes of self-organization and the presence of memory.
This allows one to take into account memory and describe not only Markov, but also non-Markov processes. In these studies, a non-linear differential equation of the second order was derived, which makes it possible to set and solve boundary value problems to determine the probability density function of the amplitude of deviations of parameters describing the observed processes of non-stationary time series, depending on the values of the time interval of its determination and the depth of memory, which significantly distinguishes it from the Fokker-Planck equation.
References
Pulipati, S., Somula, R., Parvathala, B.R.: Nature inspired link prediction and community detection algorithms for social networks: a survey. Int. J. Syst. Assur. Eng. Manag. 1–18 (2021). https://doi.org/10.1007/s13198-021-01125-8
Tran, C., Shin, W.-Y., Spitz, A.: Community detection in partially observable social networks. ACM Trans. Knowl. Discov. Data 16(2), 1–24 (2022). https://doi.org/10.1145/3461339
Chen, Z., Li, L., Bruna, J.: Supervised community detection with line graph neural networks. In: Proceedings of the 7th International Conference on Learning Representations, New Orleans, United States (2019)
Hoffmann, T., Peel, L., Lambiotte, R., Jones, N.S.: Community detection in networks without observing edges. Sci. Adv. 6(4), 1–11 (2020)
Jain, S., Sinha, A.: Discovering influential users in social network using weighted cumulative centrality. Concurrency and Computation: Practice and Experience 34(1), e6521 (2022). https://doi.org/10.1002/cpe.6521
Martins, P., Martins, F.A.: Launcher nodes for detecting efficient influencers in social networks. Online Social Networks and Media 25, 100157 (2021). https://doi.org/10.1016/j.osnem.2021.100157
Xue, L., Zhang, P., Zeng, A.: Maximizing spreading in complex networks with risk in node activation. Inf. Sci. 586, 1–23 (2022). https://doi.org/10.1016/j.ins.2021.11.064
Arafeh, M., Ceravolo, P., Mourad, A., Damiani, E., Bellini, E.: Ontology based recommender system using social network data. Futur. Gener. Comput. Syst. 115, 769–779 (2021). https://doi.org/10.1016/j.future.2020.09.030
De Bacco, C., Larremore, D.B., Moore, C.: A physical model for efficient ranking in networks. Sci. Adv. 4(7), eaar8260 (2018). https://doi.org/10.1126/sciadv.aar8260
Gardiner, C.: Stochastic Methods: A Handbook for the Natural and Social Sciences. Springer-Verlag (2009)
Lux, T.: Inference for systems of stochastic differential equations from discretely sampled data: a numerical maximum likelihood approach. Ann. Finance 9(2), 217–248 (2012). https://doi.org/10.1007/s10436-012-0219-9
Hurn, A., Jeisman, J., Lindsay, K.: Teaching an old dog new tricks: improved estimation of the parameters of stochastic differential equations by numerical solution of the Fokker-Planck equation. In: Gregoriou, G., Pascalau, R. (eds.) Financial Econometrics Handbook. Palgrave, London (2010)
Elliott, R.J., Siu, T.K., Chan, L.: A PDE approach for risk measures for derivatives with regime switching. Ann. Finance 4(1), 55–74 (2007). https://doi.org/10.1007/s10436-006-0068-5
Orlov, Y., Fedorov, S.L.: Generation of non-stationary trajectories of a time series based on Fokker-Planck equation. MFTI Proceedings 8(2), 126–133 (2016)
Chen, Y., Cosimano, T.F., Himonas, A.A., Kelly, P.: An analytic approach for stochastic differential utility for endowment and production economies. Comput. Econ. 44(4), 397–443 (2013). https://doi.org/10.1007/s10614-013-9397-4
Savku, E., Weber, G.-W.: Stochastic differential games for optimal investment problems in a Markov regime-switching jump-diffusion market. Ann. Oper. Res. 312, 1171–1196 (2020). https://doi.org/10.1007/s10479-020-03768-5
Krasnikov, КE.: Mathematical modeling of some social processes using game-theoretic approaches and making managerial decisions based on them. Russian Technol. J. 9(5), 67–83 (2021). https://doi.org/10.32362/2500-316X-2021-9-5-67-83
Zhukov, D., Khvatova, T., Millar, C., Zaltcman, A.: Modelling the stochastic dynamics of transitions between states in social systems incorporating self-organization and memory. Technol. Forecast. Soc. Chang. 158, 120134 (2020). https://doi.org/10.1016/j.techfore.2020.120134
Zhukov, D.O., Zaltcman, A.D., Khvatova, T.Y.: Forecasting changes in states in social networks and sentiment security using the principles of percolation theory and stochastic dynamics. In: Proceedings of the 2019 IEEE International Conference “Quality Management, Transport and Information Security, Information Technologies”, IT and QM and IS 2019, 8928295, pp. 149–153 (2019)
Zhukov, D.O., Lesko, S.A.: Stochastic self-organization of poorly structured data and memory realization in an information domain when designing news events forecasting models. In: The 2nd IEEE International Conference on Big Data Intelligence and Computing, Auckland, New Zealand (2016). https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2016.153
Acknowledgment
The study was carried out with the support of the Russian Science Foundation (RSF), grant № 22-21-00109 “Development of models for predicting the dynamics of social sentiment based on the analysis of time series of text content of social networks using Fokker-Planck equations and nonlinear diffusion”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Perova, J.P., Zhukov, D.O., Kalinin, V.N. (2023). Modeling the Dynamics of User’s Mood Based on the Fokker-Planck Equation and Changes in the Parameters of Network Graphs of Their Comments. In: Radionov, A.A., Gasiyarov, V.R. (eds) Advances in Automation IV. RusAutoCon 2022. Lecture Notes in Electrical Engineering, vol 986. Springer, Cham. https://doi.org/10.1007/978-3-031-22311-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-22311-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22310-5
Online ISBN: 978-3-031-22311-2
eBook Packages: EngineeringEngineering (R0)