Fast Few Shot Self-attentive Semi-supervised Political Inclination Prediction

Chakraborty, Souvic; Goyal, Pawan; Mukherjee, Animesh

doi:10.1007/978-3-031-21756-2_1

Souvic Chakraborty¹⁰,
Pawan Goyal¹⁰ &
Animesh Mukherjee¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13636))

Included in the following conference series:

International Conference on Asian Digital Libraries

791 Accesses

Abstract

With the rising participation of the common mass in social media, it is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations. The caveat here is that only influential people can make such an online polling and reach out at a mass scale. Further, in such cases, the distribution of voters is not controllable and may be, in fact, biased. On the other hand, if we can interpret the publicly available data over social media to probe the political inclination of users, we will be able to have controllable insights about the survey population, keep the cost of survey low and also collect publicly available data without involving the concerned persons. Hence we introduce a self-attentive semi-supervised framework for political inclination detection to further that objective. The advantage of our model is that it neither needs huge training data nor does it need to store social network parameters. Nevertheless, it achieves an accuracy of 93.7% with no annotated data; further, with only a few annotated examples per class it achieves competitive performance. We found that the model is highly efficient even in resource-constrained settings, and insights drawn from its predictions match the manual survey outcomes when applied to diverse real-life scenarios.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Modeling Polarization on Social Media Posts: A Heuristic Approach Using Media Bias

Automatically Identifying Political Ads on Facebook: Towards Understanding of Manipulation via User Targeting

Media Partisanship During Election: Indonesian Cases

1 Introduction

Political inclination refers to the political stance of an individual. Polling and surveying to understand the political leanings of people within a particular community, in a particular geopolitical region, or a specific context is a common approach. However, the manual polling mechanisms used today are hard to scale. Also, there is a significant chance of biased sampling as the samples are often too small in terms of the number of individuals surveyed and localized. On the other hand, if a survey or poll is conducted on online social platforms, it is impossible to control voters’ distribution to calibrate it to resemble a random sample of opinions. Often the voters in these polls are limited to being the active audience of the pollsters sharing similar political inclination. Thus the result of the same poll can be completely different if introduced by a different pollster. Therefore, algorithmic labeling of people chosen from a controllable distribution is important, rather than asking for bias-prone active participation by influencers sharing particular political inclination or cost-inefficient manual polling.

Most of the existing approaches of political inclination detection (PID) on social networks focus on probabilistic models [8, 9, 13, 14], which are in turn based on the texts generated by users. Researchers have also tried to exploit the network structure by making use of GCNs [24] (Graph Convolutional Networks). This method uses all second-degree features (neighbors of neighbors of the node/user to be classified in the graph) for a rich representation which makes the classification more accurate at the expense of speed. The data collection process involves collecting features of followers of followers of the user whose political inclination needs to be detected. As the collection of followers and all their tweets itself is a slow process limited by Twitter^{Footnote 1}, the time required for the collection of the features of the second-degree neighbours increases quadratically in terms of average unique neighbors per node.

Also, the GCN-based models need to store the huge Twitter-subnetwork involving political persons and their followers. This severely violates the users’ right to erasure as per the Article 17 of GDPR (See footnote 2) which reads as follows:

Further, these models are trained on a huge number of annotated examples. This makes the approaches hard to scale for newer settings and countries. In contrast, we show that certain easy to collect features plugged into a novel self-attentive framework can be very accurate in predicting the political inclination even if trained on a handful of annotated examples.

Our main contributions are as follows:

(1). Graph-based methods used previously raise many ethical questions [15, 17, 21, 23]. The users on social media platforms have the right^{Footnote 2}to deletion of their data from other storage systems which are dependent on social media as data source whenever their public profile on social media is deleted. Graph-based methods violate this by storing their information such as retweets, mentions, likes, and the follower-followee network. The time required to build and update such networks is huge as it will require everyday monitoring for (i) the existence of each connection and (ii) the arrival of new connections. So, the only way to use these features at inference time is to store them permanently in memory. We eliminate the need for storing such large relational graphs from past social media data of a huge number of users. We achieve this by using richer first-degree features that we collect directly at inference time along with their second degree neighbors which can be collected from the tweets of the concerned user/person to be classified directly (e.g., we collect the hashtags used by the retweeted user as it is readily available with the retweet, same for replies). Using smart augmentation of these features we beat the performance obtained by the graph-based approaches [24] at a reduced inference time.
(2). We propose a novel Fast Self-attentive Semi-supervised Political Inclination Predictor FSSPIP (Fig. 1). The experimental results show that even without using any gold annotation, we can achieve high accuracy of $\sim $ 94% using weak supervision. The model is highly scalable and free from manual intervention unlike Darwish et al. (2020) [10] which needs human supervision or cluster inspection.
(3). We bring on board multiple additional datasets to show that our model can be used in many other similar settings for political inclination detection with a handful of labeled examples (or even without it). In specific, we present several case studies on media bias and political polarization using our classifier in zero-shot settings.

2 Related Work

Stefanov et al. (2020) [20] and Baly et al. (2020) [2] used Wikipedia, Twitter, YouTube, and other channels of information to detect the political leanings of the media houses. This approach is not scalable in context of persons. Conover et al. (2011) [9] had used a corpus of 1000 annotated data points to test the supervised approaches based on bag of words. Iyyer et al. (2014) [13] used advanced neural techniques like RNNs on a labeled corpus of sentences taken from speeches of democratic and republican parliamentarians. Chen et al. (2017) [8] used graph-based approaches to show the efficacy of using an opinion-aware knowledge graph. However, these techniques fail to take the richer network features into account. They also completely rely on annotated data failing to take advantage of domain knowledge of the task in hand.

Aldayel et al. (2019) [1] analyzed the features responsible for higher accuracy in stance detection setups using network features, tweet texts and text derived features. Darwish et al. (2020) [10] on the other hand used clustering based unsupervised setup to detect stance of users mainly relying on three channels of features: retweeted tweets, retweeted accounts and hashtags. Xiao et al. (2020) [24] approached the same task using manual annotation and collecting a large dataset of non-politician social media users and politicians on Twitter. They relied on variants of relational GNNs coupled with multi-task learning. However, given the need for explicit storage of information in the graph structures, even after the training phase, the graph-based algorithms often violate privacy rights of a large section of users.

Therefore, in this paper, we attempt to solve political inclination detection in a resource-constrained setup with no storage of user data after model training. We use several task-dependent augmentation techniques and unsupervised learning methods which have not been used in this context earlier thus making our model robust, easily adaptable and scalable without any human help/supervision. We only use public data available at the time of inference.

3 Model Architecture

The Base Architecture: Like previous state-of-the-art approaches [24] we too use a GCN-like framework. However, we, in contrast, do not store the user/feature graphs nor do we need a list of politicians in the country of the users to be classified with huge set of labelled examples. We use a long list of different feature types derived from follow, mention, reply, retweet, tweets and likes. We hypothesize that for a good representation of the political inclination, there are many important but easy to collect features which can be retrieved from the web directly during inference time with no need of storage. We describe these features in details below:

Base Features

User Descriptions: We collected user descriptions of users retweeted and quoted, forming two separate documents. These user descriptions/bios often contain key information like the user’s occupation, gender, religion etc.

Hashtags: Hashtags are important as similar hashtags are used to express opinions for/against a polarizing topic by users of different leanings.

Mentions: IDs mentioned in tweets are used as features.

Media Domains: It is no secret that users of different political leanings share different sets of news items that fit their ideological perspective. Considering their importance in our task, we collect domain names and domain + co-domain names from users’ tweets. We use them as separate features.

Textual Content: We use pre-trained models like BERTweet [18], and Google’s Universal Sentence Encoder [4] to convert the content of tweets of a user into embeddings. In our experiments, we found that BERTweet performs better (possibly because BERTweet is trained on text with vocabulary more similar to ours). Thus we report BERTweet numbers only.

Features Connected to Neighborhood:

We mentioned a total of 6 features till now. We repeat the same features for retweets and replies separately. So, the total number of features become 6+6+6=18.

In addition to all these features, we also use friend ids, follower ids, mention ids, ids replied to, and ids retweeted as features collected at test time.

So, in total we use 18+4=22 features.

Attending to Different Modalities

We use $|R|=22$ features in our architecture. For each feature type r and user i, we obtain an embedding $e_{ir}$ of size $d=8$ as follows.

$$\begin{aligned} \begin{aligned} e_{ir}&= W_r \times BERTweet(T_{ir}) ,\ \text {if} \ r \in T \\&= W_r A_{ir} H_r ,\ if \ r \in T' \\ \end{aligned} \end{aligned}$$

(1)

where $A_{ir} \in \{0,1\}^{1 \times Vlen_r}$ is the feature presence-absence vector for the $r^{\textrm{th}}$ feature and $H_r \in \mathbb {R}^{Vlen_r \times d}$ is the embedding matrix containing embeddings of all the features for feature type r. While pre-processing we chose only those features in the vocabulary which appeared at least in five instances of the training data to ensure having enough training instances. The length of the vocabulary for $r^{\textrm{th}}$ feature type is represented as $Vlen_r$. $T'$ and T are respectively the set of non-textual and textual feature types. For each feature type r, $T_r$ denotes the textual content of that feature for user i, $W_r \in \mathbb {R}^{d \times d_{em}}$, where $d_{em}$ is the embedding dimension of the output of BERTweet [18]. Now, we calculate $h_i$, the final embedding for the $i^\text {th}$ user as follows.

$$\begin{aligned} \small h_i=\sum _r \alpha _{ir} \times \frac{e_{ir}}{|e_{ir}|} \end{aligned}$$

(2)

FSSPIP uses a dynamic dot-product self-attention mechanism to calculate the weights for each of the feature types to finally get a weighted sum of the normalized embeddings of each feature type. We use learnable parameters $p,q,k \in [0,1]$ to allow some flexibility in attention calculation. Learnable parameters $q_r$ and $k_r \in R^d$ are queries and keys, respectively, for each feature type r (Here a feature type is specific social media attribute, so a collection of hashtags coming from tweets is a feature type different from the collection of hashtags coming from the retweets/replies. Please refer to the list of features mentioned at the start of the section for a broader understanding). So,

$$\begin{aligned} \alpha _{ir} =p*\frac{e^{q_{ir} \times k_{ir}}}{\sum _r e^{q_{ir} \times k_{ir}}}+(1-p)*|e_{ir}| \end{aligned}$$

(3)

$$\begin{aligned} q_{ir} =q \times e_{ir}+(1-q)\times q_r \end{aligned}$$

(4)

$$\begin{aligned} k_{ir} =k \times e_{ir}+(1-k) \times k_r \end{aligned}$$

(5)

An illustration of this base architecture is presented in Fig. 1. It shows how input from each feature type goes through different transformation functions (BERTweet in case of textual data, trainable embeddings in case of follower ids etc.) to transform into embeddings which are then weighted by attention values calculated through three different architectural scheme as mentioned. The weighted summation of the embeddings (vector size 768) denote representation of the node/person to be classified. This embedding is further multiplied with a vector of size 768$\,\times \,$1 and passed through a sigmoid function to obtain probability of a person being a republican in a binary classification setup. We use binary cross entropy as the loss function for supervision.

4 Augmented Semi-supervision for Superior Representation Learning

To make our architecture ready for few-shot learning, we make the model robust using regularization and multi task learning. We also use weak supervision producing high accuracy without any labelled example. Specifically, we use three different categories of techniques which are described below.

Dynamic Augmentation

Mixup: Mixup [25] is a technique that enforces linear change in output given linear change in input by training a neural network on convex combinations of pairs of examples and their annotated labels for a particular task. We adopt the method to our network data by mixing two random users for each channel (e.g. : hashtags, domains, retweetees from both users are present for the augmented user) increasing the diversity of data points and regularizing the model for unseen data points.

Sampling: Twitter users can be imagined as generative agents who generate tweets on selective issues and follow/reply/mention/interact with other users following some implicit probability distribution. So, if some of the points from the distribution are sampled out uniformly, the distribution will not change. So, we uniformly(chosen from a random uniform distribution for each feature type) sample out features from labeled examples for augmentation masking out 0–15% of the features randomly during training.

Feature Channel Dropout: While some feature types may influence the result more than others, it is important to learn to predict from the cues available if one influential feature type (e.g. hashtags, followers, retweets etc.) is absent. So, we randomly drop random feature types while training for better performance through adversarial training.

Weak Supervision: We hypothesize that the followers/retweeters of a particular political party often share the bias of having that particular political inclination. So, they are statistically more likely to follow the leaning of that particular political party which they are following on social media than any other. This provides some silver labels in the Twitter space for weak supervision. We crawled the Twitter handles of each political party (i.e., the official Twitter handles of The Democratic & The Republican party in case of US and AAP, Congress & BJP party in case of India) to collect the last 75,000 (set heuristically to contain enough examples) followers and the last 75,000 retweeters for each of these parties. We randomly selected 2,500 from each of them to get a sample representative of the timeline (as the most recent followers appear first, so collecting a big pool and resampling may help) and collected their relevant data for training. Users following both parties were removed.

Self-supervision: Self-supervision is a semi-supervised learning technique that trains the model in a new dummy task predicting part of input data using the other part of the data [18]. While in case of textual data masked language modelling and next sentence predictions [11, 18] are the most frequently used pre-training technique, graph neural nets use predicting masked edges between nodes as the pre-training task. Following these methods, we pretrain our model with the task of prediction of the non-textual features which are masked in the dynamic augmentation phase during sampling the input features. We use self-supervision as pre-training method while performing few-shot learning and fine-tune later on the annotated data points. Hyperparameter details and loss function of the pretraining phase has been put in the Appendix available at https://tinyurl.com/icadlappendix).

5 Data Preparation

Dataset for the Main Task: As provided by Xiao et al. (2020) [24], we have 2,976 labeled data points (labelled republican or democratic) along with 583 politicians’ data in the US setting. For a nuanced analysis, we retain the partition of the data points, used in the dataset – PureP^{Footnote 3}, P50^{Footnote 4}, P20$\sim $50^{Footnote 5} and P+all^{Footnote 6}.

Table 1. Descriptive statistics of the labeled dataset.

Full size table

We collected the Twitter ids and labels provided by Xiao et al. (2020) [24]. We crawled the last 3,200 tweets (some tweets got deleted, some tweets were retweets, quotes, and replies), follower ids and friend ids of each labeled id in November, 2020 using the Twitter API^{Footnote 7}. We also collected the user objects (containing bios) for each id. So, after pre-processing, we have data for each feature type described in the previous section. We extracted the domain and co-domain names from the URLs shared using the tldextract^{Footnote 8} library. Out of 2,976 labeled users, 2,665 users were available on Twitter at the time of crawling of the tweets (November, 2020). We report our results on this dataset. A major point to be noted here is that we do not store this data once the training is over, nor do we need to collect neighborhood data at inference time making the inferece faster and memory efficient. A detailed statistics of this dataset with the count of unique features for some feature types is provided in Table 1.

Additional Datasets for Lateral Verification

We collect several other datasets to demonstrate the usefulness of FSSPIP in zero/few-shot setting. The statistics of these datasets are detailed in Table 2.

Table 2. Descriptive statistics of the collected datasets. MB: MediaBias; C: Community; MP: Multiparty; S: Statewise TPC: Topicwise; HTU: HashTagUsers (4 hashtags subset as mentioned in Fig. 3b , details in Appendix).

Full size table

The Media Bias Dataset: Following Stefanov et al. (2020) [20], we use the crowdsourced labels^{Footnote 9} for media bias prediction. There are 806 labeled instances in the dataset with labels left, center-left, least biased, center-right and right. In order to binarize the label space (to fit in our classification model which is a binary classifier), we first discard the instances with label least biased; next, we merge left, center-left to a single label left and center-right, right to a single label right. We collect the friend ids, follower ids, and the last 3,200 tweets of these media houses to employ the FSSPIP classifier for prediction.

The Ethnic Community Dataset: Many post-poll surveys establish how different communities vote differently. We try to use our model to identify such divisions. We first sample recent tweets using the Twitter API (See footnote 7) mentioning names of any of the communities. Among the users tweeting, we select only those who mention their community as one (or more) of the communities/ethnicities being probed (‘black’, ‘white’, ‘hispanic/latino’, ‘asian’), in their bio. We put a user to a particular community if that community is mentioned in their bio.

Multi-party Leaning Dataset: In order to collect a set of users residing in a multi-party democratic system, we filter the latest 10,000 tweets (and tweet-ers) containing the term ‘Delhi Election’ using Twitter API on 17^th March, 2021. We annotate random 1000 Twitter users from this list into followers of three political parties: AAP: Aam Aadmi Party, BJP: Bhartiya Janta Party and Congress/INC: Indian National Congress (AAP: 203 users, INC: 435 users, BJP: 362 users) to form the multi-party inclination dataset. We check the residency of the users and confirm it to be India through the self-declared location tag in twitter while annotating each user.

Statewise Inclination Dataset: Here, we use the Twitter API to collect tweets against a politically neutral query ‘election’ (all datapoints are collected before 17.05.2021). If the user has a state’s name mentioned in the Twitter’s location tag, we categorise that user to that particular state. We collected 100 users for each state in India for representative sample collection.

Hashtags User Data: In order to find out the inclination distribution behind each hashtag, we collect the tweets containing some trending hashtags (on or before 17.05.2021). We collect 1000 ($\times $ 30) tweets, excluding the retweets and replies containing each trending hashtags using the Twitter API. For a manual verification, we annotate 30 hashtags with tags Congress, BJP, and Neutral (In the date of collection, we could not find hashtags which can be attributed to be inclined towards AAP. Moreover, politically unmotivated hashtags are termed as ‘neutral’). This annotation was done by a PhD student, expert in Indian Politics by reading the tweets with the hashtags.

6 Main Task: Experiments and Analysis

Baselines: We use the best performing models provided by Aldayel et al. (2019) [1] and Darwish et al. (2020) [10] (UMAP+DBSCAN with tweets containing chosen hashtags included in the Appendix). NTF [1] uses network/graph and textual features together in its model just like our model without attention. UUS [10] on the other hand uses weak supervision (a quite different method compared to ours) through dimensionality reduction and clustering, manual inspection (which also makes the algorithm less scalable) and labelling of the clusters with only three features (retweeted tweets, retweeted accounts and hashtags). We also added a modified version of the UUS algorithm for a fair comparison with our fine-tuned model as the UUS algorithm is completely unsupervised and incapable of using any supervisory signal for few-shot learning. We took the unsupervised UUS model and fine-tuned it using annotated data points, terming it UUS+.

We also add non-neural baselines like SVM, Logistic regression (LR) and Random Forest (RF) as we are interested to show how simple algorithms with smaller inference time tally with our methods. Here, we use the concatenation of tweets for each user as input. We used TIMME-hierarchial [24] and its other two variants as the other baselines using self-supervision on graphs with higher inference time due to second order data collection on large graph. However, we only report TIMME-hier results as it was the best-performing variant (hyperparameter stats and details on other TIMME variants in Appendix). A qualitative comparison of the baselines is added in Table 3.

Table 3. A pointwise comparison of the models used as baselines. {NNeur : Non Neural baselines}.

Full size table

Table 4. Results of few-shot learning {NTF: Model proposed by [1]; UUS: Model proposed by [10]; UUS+: Model proposed by [10] fine-tuned on annotated data points; TIMME: TIMME-hier (other TIMME variants’ result in Appendix); FSSPIP: FSSPIP base architecture with the few-shot learning framework; #T: Number of training datapoints; TTI: Time Taken for Inference per datapoint with Twitter ids as inputs. For each framework, it also includes the time taken to collect the data}.

Full size table

Table 5. Ablation study of different model variants {F1: FSSPIP-fixedattn; F2: FSSPIP-auto; FSSPIP- - -: FSSPIP base architecture; FSSPIP- -: FSSPIP without weak supervision and self supervision; FSSPIP-: FSSPIP without self supervision.}

Full size table

Results: In Table 4, we show that our best performing model FSSPIP^{Footnote 10} fairly beats other baselines for all datasets. We see that we gain most compared to other models when very few training datapoints (50) are present^{Footnote 11}. In the case of the non-politician datasets, i.e., P50, P20–50 and P+all, the performance obtained by our model is significantly higher than other baselines, even with only 50 training data points. This may be because the non-politician datasets do not purely contain political features unlike the PureP dataset making the feature learning task less straight-forward needing finer features like domain names a user is interested in or the tweets from the retweetees.

Our model performs better than other models in terms of time required to predict for a single user. Compared to the networks using second order relational data (TIMME) we are at least $\sim $ 10x faster as shown in Table 4.

Also, our model performs better than NTF [1] and UUS [10] by a significant margin using weak supervision^{Footnote 12} with better augmentation while utilizing carefully extracted network features like NTF inputs [1].

Ablation Study

Model Variants - In order to ablate our attention mechanism we employ two other varieties of attention in place of ours in FSSPIP base architecture which are as follows: We recall Eq. 2 here to understand the two new attention mechanisms: FSSPIP-fixedattn (F1): FSSPIP-fixedattn uses fixed learnable attention to calculate a weighted sum of embeddings of each feature type. Thus here the Eq. 2 $\alpha _{r}$ values are learnable parameters and $\alpha _{ir}=\alpha _{r}$, $\forall i$. FSSPIP-auto (F2): FSSPIP-auto simply sums up each of the normalized embeddings of each feature type, assuming equal attention to all the feature types while computing the final embedding vector. So, here we assume $\alpha _{ir}=1$ $\forall i,r$.

Table 6. Important features and feature types for the predictions.

Full size table

To test the few-shot learning framework we used incrementally powerful models in Table 5, where FSSPIP- - - is the base architecture without the few-shot learning framework and then each component of the framework is added sequentially to the base model (terming those intermediate models FSSPIP- -, FSSPIP-, and finally FSSPIP).

We find that the dynamic attention mechanism produces significantly^{Footnote 13} higher gains compared to the other two attention variants. We find that the gains produced are higher when fewer data points are used and weak supervision has a higher impact than adding dynamic augmentation further to the weakly supervised model. This can be explained as weak supervision already trains the model with a large number of real data points which makes the model regularized enough. However, dynamic augmentation helps in regularizing the model, specially in fewshot settings to avoid over-fitting. Similarly, self-supervision also seems more useful in case of fewer training data points. Moreover, we can see that the attention variants of the model perform very close to the original model but falls short with low number of data points.

Most Important Feature Types - To determine the most important feature types, we drop each feature channel and measure the information loss by calculating the deviation in performance of the classifier (FSSPIP) trained on the combined dataset (train:test:validation datapoints = 80:10:10). The results are reported in Table 6. The highest drop is witnessed when the relevant hashtags are dropped.

Zero-Shot Gain: Inspired by the significant improvement by weak supervision as shown in Table 5, we trained our model FSSPIP on the weak supervision dataset only, which is collected without any manual annotation. We then use the whole annotated dataset for testing this model. We obtain a zero-shot accuracy of 93.7% (TIMME models are based on list of politicians of each party, and thus cannot be zero-shot. UUS, which is not easily scalable due to its clustering, purity checking by experts and soft labelling methodology, performed the best among other baselines at 91.9%). This tells that the social media followers of a political party are indeed, most of the time, followers of the party in real life also. So, training a model to classify a social media user to be a follower of one party over the other on social media also trains the model for the similar task of classifying the user to be a follower of one political party over the other in real life. We verify this conclusion again in a multi-party scenario for a diverse non-English speaking democracy like India in the next section.

7 Additional Task: Experiments and Analysis

We use the additionally collected datasets to show the efficacy of the zero-shot classifier. The research questions selected for this section are easy to test but important for social scientists. They had been mostly analyzed through manual surveys till now.

Media Bias Prediction: We use the trained FSSPIP classifier on the media bias dataset collected by us, taking each of the Twitter handles of the media houses as the node to be classified. We obtain an accuracy of 72.6% on the task, while we do not explicitly train for this task^{Footnote 14} and rely on the assumption that $\{\text {democrat}\equiv \text {left}\}$ and $\{\text {republic}\equiv \text {right}\}$.

Topical Polarization - Bone(s) of Contention: In order to poll users for specific contexts and issues, we collect some hashtags (see Appendix) supporting each issue mentioned in Fig. 2a. We then use the model to classify each user and plot the % of users for each leaning in the US setting, i.e., The Democrats & The Republicans.

Multi-party Inclination Prediction: US political system is binary consisting of only two political parties: The Democrats & The Republicans. In principle, our system can work for other countries and other kinds of political systems as well. In this section, we test zero-shot property classification of our model on the diverse multi-party democracy like India. We take Twitter handles of three national parties in India, namely, Aam Aadmi Party (AAP), Indian National Congress (INC) and Bharatiya Janata Party (BJP). We use the weak supervision method to train our model with sampling, mixup & feature channel dropout strategy as discussed earlier. On a random sample of 1000 Twitter accounts (AAP: 203, INC: 435, BJP: 362), we obtain an accuracy of 81.9%. The highest confusion scores between classes (see Appendix) were between AAP & INC. This is fairly intuitive since both these parties are left-leaning and in opposition, while BJP is known to be subscribing to a right-wing leaning and is currently in power.

Statewise Leaning: In Fig. 3a, we plot the relative distribution of political leanings for each state of India (on a scale of 0–1, signifying the percentage of users in a state leaning toward BJP. We get the average of political leanings for each person in the state’s data predicted using the aforementioned classifier). This correlates quite well (Pearson’s corr coef: 0.52 with high significance and low p-value, $p<0.01$) with the vote percentage received by BJP in each state in the 2019 general election.^{Footnote 15}

A Leopard Cannot Change Its Spots: In order to check if the political inclination changes with time, we reuse the same dataset described in the last paragraph with a temporal filtering strategy. We only use the tweets and tweet derived features for this experiment which means the bio is always left as blank and same is done for followers/retweeters. We collect all the 3,200 tweets (limit set by the twitter API) of each user ID, directly available from Twitter. For reliable prediction, we filter the users who have tweeted at least 100 times before 2017 and at least 100 times after 2018^{Footnote 16} This leaves us with 2,893 users. We then predict the inclination of these Twitter users twice. Once we use the features collected from tweets before 2017 and once we use the tweets after 2018. We observe that the predictions match for 91% of the cases, which tells that political leanings are temporally (almost) invariant.

Hidden Agenda - Inclination Behind Promoted Hashtags: To find out the inclination behind each hashtag, we obtain the political leanings of the users in the collected hashtag-specific dataset using the zero-shot classifier trained on followers of Congress and BJP. We plot the percentage of users leaning toward each party for each hashtag. We correctly predicted the leaning in 25 of the 30 cases using the classifier (considering a percentage distribution of 40–60% as the neutral/apolitical zone). We plot the leanings on four different India-specific issue – #WeAreWithYouPmModiJi, #BengalBurning, #CycloneTaukte, #JusticeForAsif in Fig. 3b. While we see the disaster hashtag (#CycloneTaukte) is non-polarizing, other trending hashtags are evidently promoted by people of particular ideologies. We include the list of other hashtags in Appendix.

8 Limitations and Future Work

Our work is limited by the availability of social media data. If a country does not have enough political participation in the social media, then training a model will not be possible. Moreover, if the profile of a person is kept private, the classifier will not be able to assign any label. We have discussed the related ethical implications of our work separately in Appendix.

Lastly, here we only evaluated our method on a dataset of users with high degree of political connection to very low degree of political connection. Collection of a dataset of users with no political links online but inclination toward a particular political party is a challenging task. In fact, Twitter matched voter registration data [3] also shows high partisanship evident in tweets and political connections. Research toward implicit(not explicitly tweeted/mentioned) political inclination detection (like implicit hate speech detection [5, 12] or implicit aspect specific sentiment detection [6, 16]) is an interesting future research direction.

9 Conclusions

We present an efficient, fast, and scalable few-shot learning framework for Twitter profiles for political inclination detection (FSSPIP). We showed that our model is explainable and learns features that humans find meaningful. Moreover, our model does not store any personal data of users unlike graph based models. It is also shown to be faster than graph-based methods. With the scalable representation learning framework, we achieve state-of-the-art accuracy, gaining significantly in unlabelled or few-shot learning setups on non-politician users. Enabling zero-shot political inclination detection with high fidelity, we provide a method to easily re-target this work to new countries and languages without any manual intervention/supervision unlike previous methods. We believe this will make a large-scale analysis of the political landscape throughout the globe easier and more accurate.

Notes

1.
https://developer.twitter.com/en/docs/twitter-api/v1/rate-limits.
2.
https://gdpr-info.eu/art-17-gdpr/.
3.
This dataset contains only the politicians.
4.
This dataset contains people highly interested in politics being followed or following at least 50 politicians, including the politicians themselves.
5.
This dataset contains people moderately interested in politics being followed or following anywhere between 20 to 50 politicians and the politicians themselves.
6.
This dataset contains members of PureP, P20–50 along with many outliers who are following or being followed by maximum five politicians.
7.
https://developer.twitter.com/en/docs/twitter-api.
8.
https://pypi.org/project/tldextract/.
9.
https://mediabiasfactcheck.com/.
10.
$\dag $ represents p value less than 0.05 in student’s t-test while comparing FSSPIP’s result with the best performing baseline.
11.
Training on the whole data, we got comparable accuracy with other state-of-the-art architectures (result in Appendix).
12.
We do not use any human intervention, unlike other approaches which cluster the datapoints and identify the cluster’s political affiliation by manually sampling users and annotating them. This may also be subjected to randomness in the clustering process and dependent on the characteristics of the specific subsets of the social network.
13.
We ran a significance test comparing the results of other two variants with the main (dynamic) variant and found the p-value to be less than 0.05 in all the cases (compared to both F1 and F2) as marked by † in Table 5.
14.
If we train for the task explicitly using 70:30 split for train-test data, we achieve an accuracy of 84.1%.
15.
https://eci.gov.in/.
16.
We do not consider the follower-followee network as the past snapshots are not retrievable.

References

Aldayel, A., Magdy, W.: Your stance is exposed! analyzing possible factors for stance detection on social media. Proc. ACM Hum.-Comput. Interact. 3(CSCW) (2019). https://doi.org/10.1145/3359307
Baly, R., et al.: What was written vs. who read it: News media profiling using text analysis and social media context. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3364–3374. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.308
Barberá, P.: Birds of the same feather tweet together: Bayesian ideal point estimation using twitter data. Polit. Anal. 23(1), 76–91 (2015)
Article Google Scholar
Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)
Chakraborty, S., Dutta, P., Roychowdhury, S., Mukherjee, A.: CRUSH: contextually regularized and user anchored self-supervised hate speech detection. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 1874–1886. Association for Computational Linguistics, Seattle, United States, July 2022. https://doi.org/10.18653/v1/2022.findings-naacl.144, https://aclanthology.org/2022.findings-naacl.144
Chakraborty, S., Goyal, P., Mukherjee, A.: Aspect-based sentiment analysis of scientific reviews. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, JCDL 2020, pp. 207–216. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3383583.3398541
Chakraborty, S., Goyal, P., Mukherjee, A.: (Im) balance in the representation of news? an extensive study on a decade long dataset from India. In: International Conference on Social Informatics, SocInfo (2022), arxiv.org/abs/2110.14183
Chen, W., Zhang, X., Wang, T., Yang, B., Li, Y.: Opinion-aware knowledge graph for political ideology detection. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 3647–3653 (2017). https://doi.org/10.24963/ijcai.2017/510, https://doi.org/10.24963/ijcai.2017/510
Conover, M.D., Goncalves, B., Ratkiewicz, J., Flammini, A., Menczer, F.: Predicting the political alignment of twitter users. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pp. 192–199 (2011). https://doi.org/10.1109/PASSAT/SocialCom.2011.34
Darwish, K., Stefanov, P., Aupetit, M., Nakov, P.: Unsupervised user stance detection on twitter. Proc. Int. AAAI Conf. Web Soc. Media 14(1), 141–152 (2020). https://ojs.aaai.org/index.php/ICWSM/article/view/7286
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol.1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
ElSherief, M., et al.: Latent hatred: A benchmark for understanding implicit hate speech. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 345–363. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021. https://doi.org/10.18653/v1/2021.emnlp-main.29, https://aclanthology.org/2021.emnlp-main.29
Iyyer, M., Enns, P., Boyd-Graber, J., Resnik, P.: Political ideology detection using recursive neural networks. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1113–1122. Association for Computational Linguistics, Baltimore, Maryland, June 2014. https://doi.org/10.3115/v1/P14-1105, https://www.aclweb.org/anthology/P14-1105
Kannangara, S.: Mining twitter for fine-grained political opinion polarity classification, ideology detection and sarcasm detection. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, pp. 751–752. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3159652.3170461, https://doi.org/10.1145/3159652.3170461
Koops: Forgetting footprints, shunning shadows: a critical analysis of the right to be forgotten in big data practice. SCRIPTed 8, 229 (2011)
Google Scholar
Li, Z., Zou, Y., Zhang, C., Zhang, Q., Wei, Z.: Learning implicit sentiment in aspect-based sentiment analysis with supervised contrastive pre-training. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 246–256. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021. https://doi.org/10.18653/v1/2021.emnlp-main.22, https://aclanthology.org/2021.emnlp-main.22
Meeks, L.: Tweeted, deleted: theoretical, methodological, and ethical considerations for examining politicians’ deleted tweets. Inf. Commun. Soc. 21(1), 1–13 (2018). https://doi.org/10.1080/1369118X.2016.1257041. https://doi.org/10.1080/1369118X.2016.1257041
Nguyen, D.Q., Vu, T., Nguyen, A.T.: BERTweet: a pre-trained language model for English Tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 9–14 (2020)
Google Scholar
Sood, G., Laohaprapanon, S.: Predicting race and ethnicity from the sequence of characters in a name (2018)
Google Scholar
Stefanov, P., Darwish, K., Atanasov, A., Nakov, P.: Predicting the topical stance and political leaning of media using tweets. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 527–537. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.50, https://www.aclweb.org/anthology/2020.acl-main.50
Thompson, A., Stringfellow, L., Maclean, M., Nazzal, A.: Ethical considerations and challenges for using digital ethnography to research vulnerable populations. J. Bus. Res. 124, 676–683 (2021)
Article Google Scholar
Vahini, M., Bantupalli, J., Chakraborty, S., Mukherjee, A.: Decoding demographic un-fairness from indian names. In: International Conference on Social Informatics, SocInfo (2022), arxiv.org/abs/2209.03089
Williams, M.L., Burnap, P., Sloan, L.: Towards an ethical framework for publishing twitter data in social research: Taking into account users’ views, online context and algorithmic estimation. Sociology 51(6), 1149–1168 (2017). https://doi.org/10.1177/0038038517708140, https://doi.org/10.1177/0038038517708140, pMID: 29276313
Xiao, Z., Song, W., Xu, H., Ren, Z., Sun, Y.: Timme: Twitter ideology-detection via multi-task multi-relational embedding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020, pp. 2258–2268. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394486.3403275, https://doi.org/10.1145/3394486.3403275
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=r1Ddp1-Rb

Download references

Author information

Authors and Affiliations

Indian Institute of Technology, Kharagpur, West Bengal, India
Souvic Chakraborty, Pawan Goyal & Animesh Mukherjee

Authors

Souvic Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Pawan Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Animesh Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Souvic Chakraborty .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Yuen-Hsien Tseng
Doshisha University, Kyoto, Japan
Marie Katsurai
VNU University of Engineering and Technology, Hanoi, Vietnam
Hoa N. Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakraborty, S., Goyal, P., Mukherjee, A. (2022). Fast Few Shot Self-attentive Semi-supervised Political Inclination Prediction. In: Tseng, YH., Katsurai, M., Nguyen, H.N. (eds) From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries. ICADL 2022. Lecture Notes in Computer Science, vol 13636. Springer, Cham. https://doi.org/10.1007/978-3-031-21756-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-21756-2_1
Published: 07 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21755-5
Online ISBN: 978-3-031-21756-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast Few Shot Self-attentive Semi-supervised Political Inclination Prediction

Abstract

Similar content being viewed by others

Modeling Polarization on Social Media Posts: A Heuristic Approach Using Media Bias

Automatically Identifying Political Ads on Facebook: Towards Understanding of Manipulation via User Targeting

Media Partisanship During Election: Indonesian Cases

1 Introduction

2 Related Work

3 Model Architecture

4 Augmented Semi-supervision for Superior Representation Learning

5 Data Preparation

6 Main Task: Experiments and Analysis

7 Additional Task: Experiments and Analysis

8 Limitations and Future Work

9 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fast Few Shot Self-attentive Semi-supervised Political Inclination Prediction

Abstract

Similar content being viewed by others

Modeling Polarization on Social Media Posts: A Heuristic Approach Using Media Bias

Automatically Identifying Political Ads on Facebook: Towards Understanding of Manipulation via User Targeting

Media Partisanship During Election: Indonesian Cases

1 Introduction

2 Related Work

3 Model Architecture

4 Augmented Semi-supervision for Superior Representation Learning

5 Data Preparation

6 Main Task: Experiments and Analysis

7 Additional Task: Experiments and Analysis

8 Limitations and Future Work

9 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation