Introduction

Social media platforms have become indispensable tools for people to communicate with one another in the modern era. Individuals write about almost every aspect of their lives, including their interests, experiences, accomplishments, and failures. These activities enable the analysis of public sentiment toward a variety of subjects, from products to politicians. Sentiment analysis [1] is a rapidly growing field of analysis that elicits people’s emotional responses to various topics [2]. Additionally, the sentiment analysis research community has recently shifted its focus from online reviews toward social media [3].

The primary focus of social media sentiment analysis research has been on the document’s textual content [4], implicitly assuming that the posts are independent. Social media posts, on the other hand, are not actually independent. Additionally, performing sentiment analysis solely on textual data presents numerous difficulties in practice. For example, some social media platforms, such as Twitter, limit the length of posts (280 characters); as a result, textual features are available for analysis in fewer numbers [5,6,7]. Other issues arise from the way that people use social media, such as when they type in informal language [8].

While social media has its own set of constraints, it also provides additional information, such as various types of links between posts. According to the theory of sentiment consistency [9], a particular person’s tweets are more likely to share similar sentiments than random tweets [5]. This relationship can be used in conjunction with contextual analysis. Furthermore, other theories such as homophily or assortative mixing [10, 11], and emotional contagion [12] demonstrate that related individuals share similar viewpoints [5]. According to the first theory, similarity fosters the formation of new relationships. As a result, individuals within the same networks exhibit a high degree of homogeneity. Moreover, research indicates that happy/sad people tend to interact with other happy/sad people in online social networks [10]. According to the second theory, an individual’s emotional state affects those around him or her. Additionally, experimental evidence indicates that social networks are capable of transmitting emotion [13].

Several other clues also exhibit connections between various posts. For instance, there is the hashtag concept, which is used to specify the subject of a post. Comments about specific products demonstrate varying degrees of sentimental consistency. For example, comments about the MacBook are rated higher than those about other laptops. As a result, tying together posts that share a common hashtag may aid in sentiment analysis. Furthermore, a relationship between posts can be established using text similarity based on various text representations, including Tf-IDF, Word2Vec [14], GloVe [15], and Universal Sentence Encoder (USE) [16].

Deep learning methods have recently achieved outstanding results in various natural language processing (NLP) tasks, including sentiment analysis [7]. Moreover, the machine learning community has developed deep learning-based methods for graph analysis, including graph-specific embeddings [17], node classification [18, 19], and link prediction [20, 21]. The central tenet of this field of research is that models should use the information contained in the relationships between entities. In other words, these models generate representations that can encode both local graph structures and node properties [19]. These representations can then be used within the network or for subsequent tasks such as node classification.

As an illustration, the following is a tweet that was correctly labeled by the ChebNet (User) but incorrectly labeled by the feed-forward text-based model (the models are described in the materials and methods section in more detail):

“#HCR Our greatest threat to society is when we stop caring about one another! It's a damm shame #TCOT #TEAPARTY Cant understand that”

While this is a positive tweet about healthcare reform, the text-based method underestimates its sentiment due to several negative features such as threat, stop, and shame. The ChebNet model, on the other hand, correctly predicts the sentiment because the tweep (the user who posted the tweet) has other positive tweets about HCR, including the following tweets:

“Call #Congress today. Tell them YES on #hcr! (202) 224-3121 #PUBLICOPTION”

“RT @TerresaS: Henry Aaron, David Cutler, Alice Rivlin, et al.: #hcr bill is crucial to reducing the deficit (PDF): http://bit.ly/cAf35l

Based on the findings above, this paper models social media sentiment analysis as a node classification problem and solves it using graph neural networks (GNNs). This paper makes the following contributions:

  • We use a graph to model social media posts and their various types of relationships, such as user, friendship, common friends, common hashtag, text-based similarity, and sentimental similarity.

  • We utilize deep learning models, including feed-forward neural network and graph convolutional network (GCN) in the context-aware social media sentiment analysis problem. To our knowledge, this is the first time that the GCN has been used for multi-thread context-aware sentiment analysis on social media.

  • To enable the use of existing knowledge across multiple graphs, we provide a stacking model that is more accurate than any of the base models.

  • We conduct extensive experimental studies on social media sentiment analysis using a variety of different edge sets.

The remainder of the paper is organized as follows. The following section, titled “Related Research”, discusses notable papers on deep learning and sentiment analysis in social media. The “Materials and Methods” section discusses the architecture of the models used. The “Results” section contains experimental results from a real-world sentiment analysis dataset, and the “Conclusion” section summarizes the study’s findings.

Related Research

This section discusses the papers that are related to the current study. The section is divided into two subsections. We begin by reviewing the research on social media sentiment analysis and then examine deep learning methods in this field.

Social Media Sentiment Analysis

Social media is a valuable source of data for various data mining applications, including sentiment analysis. The majority of research on social media sentiment analysis has relied entirely on the text of the posts [4, 22]. Traditional machine learning models such as support vector machines, logistic regression, decision tree [23, 24], and their stacking [25, 26] were trained on various textual features such as n-grams and parts of speech. Other researchers [27] consulted sentimental dictionaries and knowledge bases, such as General Inquirer [28], SentiWordNet [29], WordNet-Affect [30], SenticNet [31], SentiStrength [32], and Bing Liu dictionary [33], which contain information about the polarity orientation of words and phrases. They primarily used a combining function to infer the sentiment of a post by analyzing the polarity of phrases. Furthermore, negations and intensifications were frequently taken into account [34]. A separate line of research combined lexicon-based and machine learning approaches [26, 32, 35]. There are reports of leveraging the interdependence of social media posts to improve sentiment prediction accuracy. For example, several papers included a regularization term in the supervised loss function [5, 6].

Deep Learning Methods in Social Media Sentiment Analysis

Deep learning has improved the accuracy of a variety of NLP tasks [22] including sentiment analysis [36]. Researchers used popular deep learning models in this line, including feed-forward neural networks [37, 38], recurrent neural networks (RNN) [22, 39], convolutional neural networks (CNN) [40], and mixture, ensemble, or stack of them [41,42,43]. The attention mechanism [44] has also been used in conjunction with long-short term memory (LSTM) [8, 39, 45, 46] to improve token combination and to determine the relevance of sentiment words in relation to various aspects of the sentence [47]. Word and document embedding has been a significant achievement of NLP. The issue with the pre-trained embeddings was that the embeddings were not originally tailored for sentiment analysis, and some semantically related but sentimentally distinct words were projected to close points in the embedding space. This motivated researchers to propose specific embeddings which include sentimental clues of words [48,49,50,51].

Apart from classical deep learning techniques, various other techniques have been used to improve sentiment prediction accuracy. For example, social media posts may contain video, image, or audio; consequently, multimodal sentiment analysis [52] has been another topic in social media sentiment analysis, combining audio and visual cues into deep structures [53, 54]. Due to the complexity of natural language, recent efforts have been made to combine symbolic and subsymbolic artificial intelligence [31] and to leverage the power of neural tensor networks for modeling relational data in conversational sentiment analysis [55]. Moreover, because deep models contain a large number of parameters and require a large amount of data, some researchers attempted to pretrain models using distantly supervised tweets [56, 57]. Multi-task learning has also been applied to sentiment analysis and related tasks such as sarcasm and personality detection [58,59,60]. Additionally, similar to traditional methods, several efforts have been made to leverage sentiment lexicons in conjunction with deep learning models [31, 40, 61].

As a result of GNNs’ recent success in several fields, sentiment analysis research has also tapped into their potential. Examples include the use of GCNs to model the relationships between sentence aspects [62], the use of graph network embedding enabled by a variational auto-encoder [63], the use of syntax and knowledge graphs to augment the sentence representation for a given aspect [64], the construction of a word-document graph [65], and the multi-level text representation via message-passing [66]. Finally, some deep learning researchers have improved sentiment analysis by incorporating a variety of contexts. Some researchers modeled the problem as a sequence classification problem. They classified sentiment using hierarchical LSTM architectures in several contexts, including reply/retweet, hashtag, and user [8, 22]. Recently, Zhang et al. [67] used Bidirectional Encoder Representation from Transformers (BERT) with Bidirectional LSTM (BiLSTM) and Conditional Random Field (CRF) layers on top of BERT to model the sequence of tweets.

Materials and Methods

Lasso

The least absolute shrinkage and selection technique is a type of regression analysis that employs L1 regularization. Its loss is formally calculated as follows:

$$L =\frac{1}{n}supervise{d}_{loss}+\beta regularization,$$
(1)

where β denotes the weight of regularization, and the \(supervised\_loss\) is defined as follows:

$$supervised\_loss =\frac{1}{2} {\Vert X\theta -Y\Vert }_{F}^{2},$$
(2)

where \(X, \theta\) and \(Y\) represent feature, weight, and label arrays, respectively. The regularization term is defined as:

$$regularization={\Vert \theta \Vert }_{1},$$
(3)

The aforementioned prevents the model from overfitting. Additionally, by removing unnecessary features, this term acts as an automatic feature selector.

Graph Regularization

Graph regularization is a technique for incorporating graph information into supervised learning techniques. The idea is to add a regularization term to the regular loss to reduce the difference between connected node predictions. It accomplishes this by minimizing the following loss:

$$L = supervise{d}_{loss}+\alpha grap{h}_{regularization}+ \beta regularization,$$
(4)

where α denotes the weight of graph regularization. The only difference with Lasso is the addition of the second term, which is defined as:

$$graph\_regularization=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}{A}_{ij}{||{X}_{i}\theta -{X}_{j}\theta ||}^{2},$$
(5)

where \({A}_{ij}\) denotes the weight of the edge between nodes i and j. \({X}_{i}\theta\) is the prediction for \({X}_{i}.\)

Two baseline methods for graph regularization are presented: a sociological approach to handling noisy and short texts (SANT), which makes use of user and friend graphs [6], and Sentiment Analysis using Structure Similarity (SASS), which makes use of structural similarity in addition to the user graph [5]. If tweets \(i\) and \(j\) are connected by a friendship link, their structural similarity is computed as:

$${S}_{ij}= \frac{\left|{N}_{{u}_{i }}\cap {N}_{{u}_{j }}\right|}{\left|{N}_{{u}_{i }}\cup {N}_{{u}_{j }}\right|}+1,$$
(6)

Otherwise, their structural similarity is computed as:

$${S}_{ij}= \frac{\left|{N}_{{u}_{i }}\cap {N}_{{u}_{j }}\right|}{\left|{N}_{{u}_{i }}\cup {N}_{{u}_{j }}\right|},$$
(7)

where \({N}_{{u}_{i}}\) denotes the set of \({u}_{i}\)’s friends.

Deep Learning

Before the advent of deep learning, supervised machine learning focused on learning weights to minimize the loss function using expert-engineered features. By contrast, deep learning fundamentally altered this paradigm by allowing the system to acquire representations that result in improved performance. Through their hidden layers, deep learning methods attempt to learn more abstract representations. The feed-forward neural network is a biologically inspired system of processing units resembling brain neurons [68]. Each layer of a feed-forward neural network performs as follows:

$${H}^{(l+1)} = f\left({H}^{(l)}\right) = g\left({b}^{\left(l+1\right)}+ {\theta }^{\left(l+1\right)}{H}^{\left(l\right)}\right),$$
(8)

where \({H}^{(l+1)}\) denotes the activation of the layer \(l+1\), \({b}^{\left(l\right)}\) represents the bias vector, and \({\theta }^{l+1}\) is the trainable parameter matrix of layer \(l+1\). Additionally, \(g\) is a non-linear function, typically sigmoid, ReLU or tanh. Note that \({H}^{(0)}\) is the input feature vector to the network. Typically, the final layer contains a softmax function for classification purposes, which generates a probability distribution for the output classes:

$${Softmax(z)}_{i}=\frac{exp({z}_{i})}{\sum_{c=1}^{C}exp({z}_{c})},$$
(9)

where \({z}_{i}\) denotes the \({i}^{th}\) output of the final layer before applying softmax, and \(C\) represents the number of classes. The cross-entropy is the standard loss function used for classification as follows:

$$\mathrm{L}=-{\sum }_{c=1}^{C}{y}_{c}\mathit{log}\left(Softmax\left({z}_{c}\right)\right),$$
(10)

where \({y}_{c}\) denotes the \({c}^{th}\) element of the one-hot label vector. Further detailed information is available in [69].

Graph Convolutional Neural Network

Convolutional neural networks were inspired by the early findings in the study of human visual system. Studies based on the functional magnetic resonance imaging and intracranial depth recordings demonstrated the correspondence between the hierarchy of the human visual areas and the CNN layers [70, 71]. The proposed method in this paper is based on GCNs, which generalize the convolution operation from the grid to graph data.

The independence of instances is a fundamental premise of existing machine learning algorithms. As previously stated, this is not the case with social media data. GNNs are a subclass of neural network architectures that use the graph structure to convolutionally aggregate data from neighborhoods [72, 73]. The fundamental concept is to generate a node’s representation by aggregating its features with those of its neighbors. Convolution of a signal x with a filter \({g}_{\theta }\) \(\epsilon\) \({R}^{n}\) is defined as:

$$x{*}_{G} {g}_{\theta }= U\left(\left({U}^{T}x\right)\odot \left({U}^{T}{g}_{\theta }\right)\right),$$
(11)

where \(U=\left[{u}_{0};{u}_{1};\dots {u}_{n-1}\right] \in {R}^{n\times n}\) denotes the eigenvectors matrix of the normalized graph Laplacian. \({U}^{T}x\) denotes the graph Fourier transform of the signal \(x\), and \(\odot\) is the element-wise multiplication operation. In this paper, we use two types of GNNs: the Chebyshev Spectral CNN (ChebNet) [74] and GCN-Kipf [19]. ChebNet [74] employs the filter of Chebyshev polynomials of the eigenvalues diagonal matrix as follows:

$${g}_{\theta }\approx \sum_{i=0}^{k}{\theta }_{i}{T}_{i}(\stackrel{\sim }{\Lambda })$$
(12)

where \(\stackrel{\sim }{\Lambda }=\frac{2\Lambda }{{\lambda }_{max}}-{I}_{n}\), \({I}_{n}\) denotes the identity matrix, \(\Lambda\) is the diagonal matrix of the eigenvalues, and \({\lambda }_{max}\) represents the largest eigenvalue. \({T}_{i}\left(x\right)=2x{T}_{i-1}\left(x\right)- {T}_{i-2}\left(x\right)\), \({T}_{0}\left(x\right)=1\), and \({T}_{1}\left(x\right)=x\). Thus, filtering the signal \(x\) by \({g}_{\theta }\) can be written as:

$$x{*}_{G} {g}_{\theta }\approx \sum_{i=0}^{k}{\theta }_{i}{T}_{i}\left(\tilde{L }\right)x,$$
(13)

where \(\stackrel{\sim }{\mathrm{L}}=\frac{2L}{{\lambda }_{max}}-{I}_{n}\), and \(L\) denotes the Laplacian matrix.

GCN-Kipf [19] assumes \(k =1\) (in the Chebyshev filter), \({\lambda }_{max}=2\), and θ = \({\uptheta }_{0}\) =  − \({\uptheta }_{1}\). According to this formulation, each layer of the GCN-Kipf can be written as:

$${H}^{(l+1)}= f\left({H}^{(l)}, A\right) =g\left({\tilde{D }}^{-\frac{1}{2}}\tilde{A }{\tilde{D }}^{-\frac{1}{2}} {H}^{(l)}{\theta }^{(l+1)}\right),$$
(14)

where \({H}^{(l)}\) denotes the activation of the layer \(l\), \(g\) is the ReLU function, \(\tilde{A }\) is the graph’s adjacency matrix combined with self-neighborhood (\(\tilde{A }=A+ {I}_{n}\)), \(\tilde{D }\) represents the diagonal degree matrix where \({\tilde{D }}_{ii}\) = \(\sum_{j }{\tilde{A }}_{ij}\) and \({\theta }^{(l+1)}\) is a trainable weight matrix. The argument of \(g\) is referred to as the propagation rule, and it is the primary difference between different GCNs. Equation (14) employs a propagation rule representing each node as the normalized sum of its normalized neighbors’ representations. Our experiments make use of GCNs with two hidden layers. Similar to feed-forward neural network layers, GCN layers can be stacked to extract high-level node representations. We use the TF-IDF representation of tweets’ text as input for all machine learning models. We implement all models in TensorFlow except for GNNs, which are implemented in Python using the GCN package (https://github.com/tkipf/gcn).

Stacking Model

Different graphs provide a node with different neighbors, resulting in different node representations in model. As a result, multiple predictions are generated for a single test node. A stacking model based on multiple GCNs is proposed to leverage the knowledge embedded in different graphs. A feed-forward neural network is trained on the outputs of the two best classifiers on the validation set, including ChebNet (User) and GCN-Kipf (Social). These classifiers have an aggregation layer with 17 and 32 hidden nodes and are trained with a learning rate of 0.0007 and 0.01, respectively. The feed-forward network is composed of an input layer with four nodes (two for each base classifier, i.e., one for each of the positive and negative classes), a hidden layer with ten nodes (using the ReLU activation function), and an output layer with two nodes (one for each class) and a softmax activation. Adam is used for optimization. Figure 1 depicts the architecture of the proposed stacking model.

Fig. 1
figure 1

The proposed stacking model

Results

Setup

Google ColaboratoryFootnote 1 is used as the implementation environment, a free research tool equipped with a Tesla K80 GPU and 12G RAM. The experiments are conducted using the HCR datasetFootnote 2 [74], including tweets about the healthcare reform. This paper considers only tweets from users who have at least one friend in the dataset. Sixty percent of the total data is used for training, 20% for validation, and 20% for testing. There are 988 negative tweets (73.62%) and 354 positive tweets (26.38%). Our experiments employ the following graphs:

  • User: This graph connects all of the tweets from the same tweep. In other words, \({U}_{ij}=1\) if tweets \(i\) and \(j\) were both posted by the same tweep. There are 743 unique tweeps, with an average of ~ 1.80 tweets per tweep (Fig. 2).

  • Social: The friendship graph is based on data crawled by Kwak et al. [75], representing a snapshot of Twitter in 2009 (http://an.kaist.ac.kr/traces/WWW2010.html). Each tweet is connected to the tweets posted by the tweep’s followers or followings. In other words, \({S}_{ij}=1\) if tweet \(i\)’s tweep is a follower/following of tweet \(j\)’s tweep (Fig. 2).

  • Common friends: Similarity detection via common friends is a well-established technique in social network analysis [5]. We construct a graph in which two tweets are connected if their respective tweeps share friends. In other words, \({CF}_{ij}=1\) if tweets \(i\) and \(j\) were created by two tweeps who share at least one friend.

  • Topic: This is a graph that connects tweets that use the same hashtag. In other words, \({T}_{ij}=1\) if tweets \(i\) and \(j\) share a hashtag.

  • Sentimental Similarity: This graph connects tweets based on their sentiments to the most comparable tweets. SentiStrength [32] is used to determine the sentiment of tweets. It is a tool that was developed using a machine-learning-enhanced human-annotated sentiment lexicon. SentiStrength assigns each tweet a positive and negative sentiment score. The Cosine similarity between the sentiment vectors of tweets is calculated, and then the tweets with the highest similarity are connected. In other words, if tweet \(j\) is one of the n most similar tweets to tweet \(i\), or vice versa, then \({SS}_{ij}=1\).

  • Text Similarity: As with the topic graph, text similarity can be used to connect tweets that are semantically related. We represent tweets using universal sentence encoder (USE), which embeds each tweet in 512-dimensional vector space [16]. The graph is constructed by repeatedly selecting the most similar tweets based on their cosine similarity. In other words, \({USE}_{ij}=1\) if tweet j is one of the n most closely related tweets to tweet \(i\).

Fig. 2
figure 2

Illustration of user and social graphs

Table 1 presents the statistics for the graphs used in the experiments. User graph is the most sparsely connected, followed by Sentimental Similarity and USE. The topic graph is the most densed graph.

Table 1 Statistics of the used graphs

Experiments

This section summarizes the experimental findings.

Table 2 shows the smoothness of the sentiment signal for each graph. The smoothness of a graph signal, in particular, indicates the degree to which a signal’s node values are related to their neighbors’ corresponding values. As a result, we use this concept to investigate the correlation between the structure of these graphs and the sentiment labels. The following defines the smoothness of a signal on a weighted undirected graph [75]:

$$Smoothness= {S}^{T}LS= \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}{{A}_{ij}({S}_{i}-{S}_{j})}^{2},$$
(15)

where S denotes the signal, \(L\) is the Laplacian of the underlying graph, and \({A}_{ij}\) is the weight between node \(i\) and \(j\). Lower smoothness values indicate that the underlying graph is more closely related to the signal. We provide the smoothness of a graph with the same number of nodes and edges but randomly chosen edges in the third column. The fourth column contains the normalized difference between the random signal smoothness and the graph smoothness, which is calculated as follows:

$$ND=\frac{RS-GS}{RS},$$
(16)

where \(RS\) denotes the random signal smoothness, and \(GS\) is the graph smoothness. Increased ND values imply a stronger correlation between the sentiment signal and the graph to which it is associated. Table 2 demonstrates that the sentiment signal is smoother than the random graph for all graphs except the common-friends graph. Additionally, the fourth column demonstrates that the sentiment signal is smoother when applied to the user graph, followed by the hybrid User + Social graph and the social graph, which is consistent with sentiment consistency and homophily theories.

Table 2 The smoothness of the sentiment signal with respect to different graphs

Table 3 shows all the hyperparameters used in different models. The values have been tuned on the validation set.

Table 3 Hyperparameters of the models

The results of the baseline models are summarized in Table 4. Except for SANT and SASS, which also use graph regularizations, these methods rely entirely on textual data. In general, sentiment analysis employs three approaches: lexicon-based [76], machine learning-based [22], and hybrid approach [77]. The first baseline is SentiStrength [32] which assigns a positivity score of (1:5) and a negativity score of 1 (− 1: − 5). In each case, the greater absolute value indicates a more sentimental text in that direction. The sum of these scores determines a tweet’s classification. Because the data set contains only positive and negative tweets, when the result is 0, the tweet is classified randomly (proportional to the ratio of classes). As can be seen, this method achieves a level of accuracy of ~ 65%. It is worth noting that this method is unaware of the data distribution. Lasso improves this baseline to 73.60. Logistic regression outperforms other techniques. A slightly improved result is obtained by adding a hidden layer (16 units) to the logistic regression and constructing a feed-forward neural network. The results achieved using graph regularization methods are comparable to those using other baselines (except for SentiStrength), where SANT [6] achieves a ~ 75% accuracy, and SASS [5] achieves a ~ 78% accuracy.

Table 4 Experimental results of the baseline models

Table 5 presents the experimental results for the graph convolutional models. Convolutional graph models outperform textual models and regularization methods for graphs. The findings suggest that leveraging tweet relationships via GCNs (rather than the graph regularization method) is more advantageous. Additionally, ChebNet outperforms GCN-Kipf in most cases, which is likely because GCN-Kipf is a special case of ChebNet. Due to sentiment contagion and existence many social edges, the social graph achieves the best result.

Table 5 Experimental results of GCN-Kipf and ChebNet using different graphs and the stacking model

Furthermore, the User and Sentimental Similarity graphs perform well (in the ChebNet model) due to the consistency and the knowledge injected into the model by the lexicon. The primary reason the social graph outperforms the user graph is due to the former’s more significant number of edges. Previous research with graph regularization has produced comparable results [5].

In comparison to other graphs, the topic, USE, and common-friends graphs all produce unsatisfactory results. Despite its dense nature, the topic graph is not particularly useful. Perhaps this is because when confronted with a particular subject, individuals can experience a range of emotions. For instance, some people regard #hcr positively, while others do not. This situation is exacerbated in the USE graph, where the results indicate that text similarity alone may not be sufficient to distinguish sentiment classes, as two identical texts one containing only one negative word can have the opposite sentiment. The common-friends graph is not as precise as the direct friendship graph. This finding could imply that individuals share similar views with their friends but are less likely to share them with their friends’ friends. As illustrated in Table 5, the proposed stacking model outperformed all other models in the experiments due to the model’s use of two concurrent graphs, which reduces the model’s error rate.

Finally, to better understand the GCN models’ behavior, we calculate the information-to-noise ratio, which is essentially the ratio of neighbors with similar labels to the target node to all neighbors. It has been demonstrated that the success of GNNs is because the information received from neighbors is greater than the noise received [78]. As a result, we examine the relationship between prediction accuracy and the information-to-noise ratio of nodes in the user and social graphs. Figure 3 depicts the obtained results for the User GCN graph. The chart on the left illustrates the histogram of the information-to-noise ratio for nodes in which the feed-forward (context-free) model is successful, but the ChebNet (User) (contextual) model fails. The chart on the right depicts the opposite situation, in which the User GCN model succeeds, but the feed-forward model fails. This figure shows no clear correlation between the information-to-noise ratio and feed-forward model performance, whereas ChebNet (User) performs significantly better in nodes with a higher information-to-noise ratio. This is due to the GCN model’s nature. When neighbors’ features are more similar to those of the target node, GCN provides more useful information for the target node’s contextual representation, resulting in improved performance on the target node.

Fig. 3
figure 3

Performance evaluation of the average information-to-noise ratio: a nodes where the feed-forward model is successful, but ChebNet (User) fails, and b nodes where ChebNet (User) succeeds but feed-forward fails

Similarly, Fig. 4 depicts the information-to-noise of the Social GCN model. The chart on the left depicts the histogram of the information-to-noise ratio for nodes where the feed-forward model succeeds but the Social GCN model fails, while the chart on the right depicts the opposite. Once again, the GCN model performs better with smoother context tweets. The average information-to-noise ratio of the user and social graphs in the feed-forward model (left) and stacking model (right) are depicted in Fig. 5. The figure demonstrates that the stacking model performs better with samples with a higher information-to-noise ratio, contrary to what the context-free model predicts. GCN may receive data or noise from other nodes. Aggregating nodes that belong to the same class provides information by bringing their representations closer together and increasing the likelihood that they assign to the same class. On the contrary, noise is introduced due to the aggregation of nodes belonging to other classes.

Fig. 4
figure 4

Performance evaluation of the average information-to-noise ratio: a nodes in which the feed-forward model succeeds but GCN-Kipf (Social) fails, and b nodes where GCN-Kipf (Social) is successful but feed-forward fails

Fig. 5
figure 5

Performance evaluation of the mean of the average of information-to-noise ratio: a nodes where the feed-forward model succeeds but stacking model fails, and b nodes where stacking model is successful but feed-forward fails

Conclusion

The purpose of this study was to perform sentiment analysis utilizing tweet contexts. Previously, this procedure was carried out using graph regularization [6]. GCNs were chosen for sentiment analysis on Twitter due to recent advances in graph signal processing [79] which enable a more accurate application of deep learning on graphs. We analyzed contextualized Twitter sentiment using GCNs. The stacking of contextual graphs, including user and friendship, was investigated using two distinct types of GNNs, ChebNet [74] and GCN-Kipf [19]. We obtained promising experimental results on a real-world Twitter sentiment analysis dataset, outperforming both text- and graph-based models. Additional research is required to examine other GNN models including models that utilize multiple graphs end-to-end. Additionally, as the paper’s final section implies, GCNs rely on a high information-to-noise ratio, which indicates that their performance on low-information-to-noise nodes may be suboptimal. Sentiment classifiers that make better use of context are promising candidate models for future research.