1 Introduction

A political party, according to ACE (2012), is an organized group of people who exercise their legal rights to identify with a set of similar political aims and opinions, and one that seeks to influence public policy by allowing its candidates elected to public office. Political parties gain control over the government by winning elections with candidates they officially sponsor or nominate for positions in government. They also coordinate political campaigns and mobilize voters. Political parties exist to win elections to influence public policy. This requires them to build coalitions across a wide range of voters who share similar preferences. Even though the presentation of candidates and the electoral campaign are the functions that are most visible to the electorate(s), political parties fulfill many other vital roles which could directly or indirectly influence the people that are registered to vote (the voters).Footnote 1 The appearance of candidates in the electoral campaign is widely accepted to also influence the election outcomes (Heredia et al. 2018)

The popularity of social networking, microblogging and blogging Web sites has evolved to become a varied kind of way people express their thoughts and feelings, and at such a huge quantity of data is generated. For example, the nature of microblogging allows users to post real-time messages about their opinions on a variety of topics, discuss current issues, complain, and express sentiments (positive/negative) about things that influence their daily life. Recently, people are sounding off online like never before, like checking the reviews or ratings of movies or products before watching the movie in theatres or buying the products. In fact, manufacturing companies and politicians have started to use microblogs as a medium to get a sense of general sentiments about the way people view their products, views, or personalities. Hence, this paper investigates if the sentiment analysis of political data can discover insights that show the influence political parties have on their candidates or vice versa which may lead to their winning or losing an election. This is certainly interesting as it guides political parties/candidates to know if people support their program or not.

Social media and other information and computer technologies have changed the dynamics of politics and political participation in Nigeria since 2011. Political actors likewise political parties now reach out to wider political space in millions across climes with their manifestoes and ideologies without embarking on distance tours. Social media has added coloration to techniques of the political campaign even in the developing countries. A larger percentage of our population has access to and is also consciously connected to social media for ideas and news sharing, information, and entertainment (Bettina 2009; Nwagbo et al. 2016). Nwagbo et al. (2016) further maintained that social media grants many people the chance to participate actively in political discourses by adding their views to issues under discussion.

In this study, we try to understand empirically from Twitter discussions if political parties or their candidates could influence winning or losing an election. For this purpose, we use Twitter data collected on the election day of the Anambra State Gubernatorial Election held on November 18, 2017. To do this, we use a Natural Language Processing (NLP) method called sentiment analysis (SA) to conduct data analysis experiments on the election Twitter data. The experiments involve the polarity sentiment analysis (PSA) and the subjectivity sentiment analysis (SSA) on all the tweets considering time as a useful dimension of SA.

Our purpose of PSA and SSA is to find attitudes of the people toward the political actors and to evaluate whether Twitter users were tweeting facts during the election or whether most of their messages were emotional subjective opinions based on a given time. Furthermore, using the word frequency and a topic modeling algorithm, we find words most associated with the political actors and most talked about topics and how they are related to each other per political actor in a given time.

Thus, to analyze tweets in terms of polarity and subjectivity, we propose the following research question:

  • Research Question 1: How sentiment of the tweets for a particular candidate/party behaves across a given time frame to ascertain attitudes of the public toward the political actors?

  • Research Question 2: How subjectivity scores for each candidate/party varied across time and which of the candidate/party whose mention alone got a high frequency score in more subjective tweets?

Time has been considered in the literature as a useful dimension for sentiment analysis (Giachanou and Crestani 2016; Nwagbo et al. 2012). Tracking opinion over time is a powerful tool that can be used for sentiment prediction or to detect the possible reasons for a sentiment change. In particular, understanding topic and sentiment evolution during election allow the government, election observers or people to capture sentiment changes and act promptly. For example, understanding the sentiment change on a particular candidate during an election can reveal possible topic trends that can show people’s attitude about the candidate.

To evaluate the stated research questions, we performed the following experimental analysis on the Twitter data we collected (see Table 1):

  • Polarity and subjectivity analyses considering a two hourly time granularity attribute. Then, the averages of the two analyses scores are calculated. Every 2 h generally means tracking topic changes eight times a day from 06:00 to 23:59.

  • Find the most talked about topics. In each topic, we investigate a political actor’s name with the highest frequency of occurrence. Investigation includes

    1. 1.

      Which of the political actor whose mention alone in a tweet got a high frequency score in more polarity or subjectivity tweets.

    2. 2.

      How important are the words most associating with a political actor in a given topic?

The experiments started with preprocessing of the tweets and performing initial investigations on them to discover the most common co-occurring words and the number of tweets per candidate and political parties. Furthermore, we group the tweets based on the names of interest (top five political parties and candidates) and perform sentiment analysis using Textblob’s Naive Bayes Classifier (NBC)Footnote 2 and SENTIWORDNET (Esuli and Sebastiani 2007) on the set of tweets in each group to determine the polarity of each tweet. For finding most talked about topics and most frequently associating words, we use latent Dirichlet allocation (LDA) (Blei et al. 2003) and word frequency, respectively.

2 Related work

In the modern politics, Twitter has been in the forefront of political discourse, with politicians choosing it as their platform for disseminating information to their constituents. This has instigated parties and their candidates to an online presence which is usually dedicated to social media coordinators.

In this section, we present some previous works related to sentiment analysis of Twitter discussions on politics. Sentiment analysis is a Natural Language Processing task, where the system has to test the sentiments of texts based on the training data, which obviously sounds like a machine learning problem. Starting from being a document-level classification task (Turney 2002; Pang and Lee 2004), it has been handled at the sentence level (Hu and Liu 2004; Kim and Hovy 2004) and more recently used in the analysis of political texts, especially from the Twitter collection. Tumasjan et al. (2010) validate Twitter as a forum for political deliberation and validly mirror offline political sentiment based on the context of the German federal election. Hoegg and Lewis (2011) explore the effectiveness of social media as a resource for both polling and predicting the election outcome. Wang et al. (2012) analyze public sentiment toward presidential candidates in the 2012 US election as expressed on Twitter. Vilares et al. (2015) ranks political leaders, parties and personalities for popularity by analyzing Spanish political tweets. Conover et al. (2011) analysis of political polarization on twitter demonstrates that the network of political retweets exhibits a highly segregated partisan structure, with extremely limited connectivity between left- and right-leaning users. Razzaq et al. (2014) experimental results validate social media content as an effective indicator for capturing political behaviors of different parties. In other words, positive, negative and neutral behavior of the party followers as well as the party’s campaign impact can be predicted. Dahal et al. (2019) research shows that social media Web sites can be used as a data source for mining public opinion on a variety of subjects and LDA was applied for topic modeling to infer the different topics of discussion. Boutet et al. (2013) research on the usefulness of analyzing Twitter messages to identify both the characteristics of political parties and the political leaning of users. Makazhanov et al. (2014) reveal in their work that the political preference of users can be predicted from their Twitter behavior toward political parties. Furthermore, Pak and Paroubek (2010) has shown that Twitter, a microblogging platform, is valid for building a corpus for sentiment analysis and opinion mining. de França et al. (2018) propose a method to segment the Twitter users into groups such as popular, activists and observers to help filter out information and give a more detailed analysis of the important events.

These researches demonstrated that political insight is a phenomenon present on Twitter; hence, this paper presents a comprehensive sentiment analysis considering the common co-occurring tweet words and polarized tweets connections among such groups as a political party, candidate and political party cum its candidates.

3 Methodology

Figure 1 shows the steps of the methodology followed in this work. In the first step, we collect Twitter data and described the process of tweets collection that formed the data of this work. We perform some clean up on the data collected as the second step. This process is discussed in detail in Sect. 3.2. The third step is the political groups’ analyses in Sect. 3.2 which are the groups of data recorded in the collection column of Table 1. The fourth and fifth steps of the methodology are analyzed in Sects. 3.4 and 3.5, respectively. We perform the tweets’ texts exploratory and sentiment analyses for three distinct groups: the individual parties, the candidates and the individual parties cum their candidates.

Fig. 1
figure 1

Methodology steps

3.1 Data collection

This section presents information about Anambra State and its gubernatorial election in the November 18, 2017, and the Twitter social network and its features. The method of collecting the November 18, 2017, Anambra State gubernatorial election Twitter data is discussed in the following subsections.

3.1.1 November 18, 2017, Anambra State gubernatorial election

Anambra is a state in southeastern NigeriaFootnote 3 with 21 local government areas (LGAs). The State Gubernatorial Election (SGE) is conducted every 4 years just like every other state in the country. The November 18, 2017, SGE is a bit significant since the state became the first in the nation to have 37 political parties and candidates participated in the governorship election. In this paper, we only looked at the five major parties and their candidates: Willie Obiano (the incumbent Governor) of the state ruling All Progressive Grand Alliance (APGA), Tony Nwoye of the national ruling All Progressives Congress (APC), Oseloka Obaze of the People’s Democratic Party (PDP), Osita Chidoka of United Progressive Party (UPP), and Godwin Ezeemo of the People’s Progressive Alliance (PPA). The APGA candidate swept the entire 21 LGAs in the state according to the election results pulling a total vote of 234,071 to finish ahead of the candidate of APC who got 98,752.Footnote 4\(^,\)Footnote 5 This is of considerable significance in this research since the candidates of the other political parties involved are from some of the 21 LGAs.

3.1.2 Twitter

TwitterFootnote 6 is a social network classified as a microblog with which users can share messages, links to external Web sites, images, or videos that are visible to other users subscribed to the service. Messages that are posted on microblogs are short in contrast to traditional blogs. Blogging becomes ’micro’ by shrinking it down to its bare essence and relaying the heart of the message and communicating the necessary as quickly as possible in real time. Twitter, in 2016, limited its messages to 140 characters (Giachanou and Crestani 2016). There are other microblogging platforms such as Tumblr,Footnote 7 FourSquare,Footnote 8 Google+.,Footnote 9 and LinkedInFootnote 10 of which Twitter is the most popular microblog launched in 2006 and since then has attracted a large number of users. Researches, as presented in the Related Work section, have shown that Twitter data are well suited as a corpus for sentiment analysis and opinion mining.

3.1.3 Collecting data from twitter

As discussed in Crawling Twitter Data of Kumar et al. (2014), data collection was done using Twitter Streaming Application Programming Interface (API) and Python. API is a tool that makes the interaction with computer programs and Web services easy. It enables real-time collection of tweets. Many Web services provide APIs to developers to interact with their services and to access data in a programmatic way. For this work, we use the Twitter Streaming API to download tweets related to three keywords: ‘#anambradecides2017,’ ‘#anambraelections,’ and ‘#anambradecides’ on the day of the election. The objective of the real-time collection was to collect only tweets about the election published on the same day. We based on the hypothesis that if there is a tweet about Anambra State election that same day, then that tweet could be making a reference to what the user is experiencing at the moment about the election. The Twitter data that we collected are stored in JSON format to make it easy for humans and computer to read from the data and to parse it respectively.

3.2 Preprocessing tweets data

Figure 2 shows five basic steps we took in preprocessing the dataset of tweets we collected as discussed in Sect. 3.1.3.

Fig. 2
figure 2

Basic steps of preprocessing standard tweets

Table 1 Dataset used in this study

We did preprocessing in two ways: Method 1 involves using tweet-preprocessor, a preprocessing library for a tweet data written in Python, to clean and tokenize the tweets. Tokenization involves converting a sentence into a list of words. In method 2, we manually defined a function to double check our tweet preprocessing and remove other unwanted tweets like retweets. This is to be sure that our data are reasonably cleaned. Moreover, spelling correction is one of the unique functionalities of the TextBlob library. With the correct method of the TextBlob object, we corrected all the spelling mistakes in our tweets. The final steps involve removing stopwords and punctuations, and stemming which is transforming any form of a word to its root word. Also included in the lists of stopwords are the party and candidate names, especially when we want to generate a word cloud image on any of the political parties or candidates since they are the targets. This is to enable meaningful words to be displayed than having party/candidate names seen all over the word cloud image.

Table 1 shows the total of tweets we collected and tweets associated with each political party and candidates before and after preprocessing. Rows 1 to 15 show the names we are interested in investigating on this study, and we add columns of Booleans that indicated whether a name of interest was in the tweet or not. The Total of Tweets After Preprocessing column shows that the names of interest for this work formed 67.08% of 7430 total tweets. The remaining percentage is unclassified tweets. This experiment focuses on investigating whether the political actors stated in Sect. 3.1.1 can influence winning or losing an election. The results of the preprocessed tweets are stored in the CSV file. CSV file enables data storage into columns of variables and rows of observations.

3.3 Experimental tools

In this experimental analysis, we use Sentiwordnet Esuli and Sebastiani (2007) to compare the overall analysis scores of TextBlob’s Naive Bayes Classifier (NBC). The two sentiment classifiers are used to determine the overall polarity scores for the sake of comparison, while Textblob is further used to perform detailed polarity analysis on the political actors and to determine the subjectivity of tweets. LDA (short for Latent Dirichlet Allocation) and word frequency are used for topic modeling to infer the different topics of discussion and to find most common occurring words, respectively.

TextBlob is an extremely powerful NLP library for Python for processing textual data. It provides a consistent API for diving into common Natural Language Processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more. NBC is a classification technique based on Bayes’ theorem with an assumption of independence among predictors. In simple terms, a NBC assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. NBC is based on the Bayes’ theorem:

$$\begin{aligned} P(A|B) = \frac{P(B|A)~*~P(A)}{P(B)} \end{aligned}$$

SENTIWORDNET (Esuli and Sebastiani 2007) is the result of the automatic annotation of all the synsetsFootnote 11 of WORDNET according to the notions of ‘positivity,’ ‘negativity,’ and ‘neutrality.’ Each synset s is associated with three numerical scores Pos(s), Neg(s), and Obj(s) which indicate how positive, negative, and ‘neutral’ the terms contained in the synset are. Different senses of the same term may thus have different opinion-related properties. For example, in SENTIWORDNET 1.0 the synset [estimable(J,3)] corresponding to the sense ‘may be computed or estimated’ of the adjective estimable has an Obj score of 1:0 (and Pos and Neg scores of 0.0), while the synset [estimable(J,1)] corresponding to the sense ‘deserving of respect or high regard’ has a Pos score of 0:75, a Neg score of 0:0, and an Obj score of 0:25. Each of the three scores ranges in the interval [0:0; 1:0], and their sum is 1:0 for each synset. This means that a synset may have nonzero scores for all the three categories, which would indicate that the corresponding terms have, in the sense indicated by the synset, each of the three opinion-related properties to a certain degree.

LDA (Blei et al. 2003) is an unsupervised machine-learning model that takes documents as input and finds topics as output. The model also says in what percentage each document talks about each topic. Hence, a topic is represented as a weighted list of words.

3.4 Exploratory twitter data analysis: EDA

We use EDA approach to analyze the Total tweets in Table 1 after preprocessing to summarize their main characteristics with visualizations. The EDA process is a necessary step prior to sentiment analysis or building a model in order to unravel various insights that will become important later.

3.5 Sentiment analysis

Sentiment analysis of tweets involves understanding the attitudes, opinions, views, and emotions from tweets using Natural Language Processing (NLP) techniques. In this section, we look at sentiment analysis involving subjectivity and polarity.

3.5.1 Polarity and subjectivity analyses

Polarity is a sentiment analysis that determines whether a tweet expresses a positive, negative, or neutral opinions. This enables the determination of the attitude of Twitter users for topics under discussion via quantifying the sentiment of texts.

Subjectivity is a sentiment analysis that classifies a text as opinionated or not opinionated. Terms such as adjectives, adverbs, and some group of verbs and nouns are used to identify a subjective opinion. Speech patterns such as the use of adjectives along with nouns are used as an indicator for the subjectivity of a statement (Kharde et al. 2016; Yaqub et al. 2018). Thus, subjectivity analysis is the classification of sentences as subjective opinions or objective facts.

In this study, we have used tools as described in Sect. 3.3 on our dataset. Hence, from the dataset in Table 1, we classify tweets as positive, negative, or neutral opinions and further identify the ones that are subjective from those that are objective. Furthermore, we compute word frequency to find most talked about words, identify most discussed topics using LDA, and look at how these most frequent words contribute to the importance of the LDA topics. Finally, we describe how the most frequent words and most discussed topics are related to the computed sentiments per political actor at a given time.

Fig. 3
figure 3

Polarity and subjectivity metrics

Figure 3 briefly describes the polarity metrics. Algorithm 1 shows subjectivity and polarity calculations. As explained in Sect. 3.3, in Polarity, we are looking at how positive or negative a tweet is. \(-1\) is very negative while \(+1\) is very positive. For subjectivity, we are looking at how subjective or opinionated a tweet is. 0 is a fact while \(+1\) is very much opinion.

figure a

4 Results

This section presents the results of our experiments based on the stated research questions in the introduction. While we used sentiment analysis to research on people’s attitude toward political candidates and parties and whether such attitude is subjective or not, the use of exploratory data analysis gives us quantitative clues on our Twitter dataset.

4.1 Exploratory data analysis (EDA)

This process, as explained earlier in 3.4, is an important step usually performed before sentiment analysis to quantify the dataset in frequency. We start by counting names of interests by adding columns of Booleans in our Pandas data frame [a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)] to indicate whether a name of interest was in the tweet or not. See the results in Table 1 from row 2 to 16 and Fig. 4.

Fig. 4
figure 4

Plot for counts for names of interest. a Is counts for candidates’ names of interest. b Is count for political parties’ names of interest. c Count for both parties’ and candidates’ names of interest

Fig. 5
figure 5

Words that co-occurred most frequently with the candidates and their parties. a Most frequent words associated with Willie Obiano—the incumbent governor and winner of the election. b Most frequent words associated with Willie Obiano cum his party, APGA. c Most frequent words associated with Obaze Oseloka. d Most frequent words associated with Obaze Oseloka cum his party PDP. e Most frequent words associated with Tony Nwoye. f Most frequent words associated with Tony Nwoye cum his party APC. g Most frequent words associated with Godwin Ezeemo. h Most frequent words associated with Godwin Ezeemo cum his party PPA. i Most frequent words associated with Osita Chidioka. j Most frequent words associated with Osita Chidioka cum his party UPP

Furthermore, we explore the most frequent words associated with the political actors. We added columns for tokens in our Pandas data frame and get rid of stopwords including punctuation, political parties and candidates names, and then, we generate a word cloud image for the frequent words. For example, words that co-occurred most frequently with the political actorsFootnote 12 are shown in Fig. 5.

4.2 Sentiment analysis

At the first step of the sentiment analysis, we analyzed tweets using two different sentiment classifiers such as TextBlob and SentiWordNet with the aim of looking at their overall scores. The sole purpose of using these sentiment classifiers was to give a comparison of their scores and to determine which classifier to use.

Tweets gathered from public accounts were 33,502 in number. However, after preprocessing only 7430 tweets remained. Among the two sentiment analyzers we compared in this research, we found that SentiWordNet had the highest rate of tweets with positive sentiment, 2916 in number and 39.25% in percentage, while Textblob is highest in neutral sentiment rate of 3971, 53.45%, which can be viewed in Table 2 and Fig. 6.

Table 2 Percentage/number of polarity calculations of different sentiment classifiers
Fig. 6
figure 6

Polarity calculation with each sentiment classifier

We use Textblob sentiment tools beyond this point because of its popularity. There are two aspects of polarity sentiment analysis (PSA) and subjectivity sentiment analysis (SSA) conducted. First, we apply both PSA and SSA on all the tweets regardless of the time of tweeting from the users. Second, we considered time as a useful dimension of sentiment analysis. The second elucidates the research question 1, considering time-series topic tracking and to find whose name is most mentioned in each topic. In both sentiment analyses, we want to find the attitude of the public toward the political actors and possibly the reason(s).

4.2.1 Polarity sentiment analysis (PSA)

First PSA Regardless of the time of tweets by the users, we compute the sentiment polarity for each tweet in Table 1 and aggregate the summary statistics per collection. This analysis includes all the political actors mentioned in Sect. 3.1.1 and in Table 1. Using algorithm 1, we compute polarity scores between \(+1\) and \(-1\). A tweet is classified positive if \(polarity score > 0\) or negative if \(polarity score < 0\) otherwise classified as neutral. To visualize the overall public opinions or feelings about the election, we compute the sentiment frequency distribution on the overall tweets and per category as recorded in our dataset in Table 1.

Fig. 7
figure 7

Distribution of polarity in our Twitter dataset

Figure 7 shows the frequency distribution of sentiment polarity in our dataset. From this figure, it is evident that most of the tweets in our dataset are positive and have polarity between 0 and 0.5.

Fig. 8
figure 8

Polarity sentiment frequency distributions (FD). This shows FD of all the tweets in Table 1 where the political actors are mentioned. We used the top five political candidates and their parties. a Is the polarity sentiment frequency distribution per candidate and b is the the polarity sentiment frequency distribution per political party

Figures 8a, b show the polarity sentiment frequency distribution for the political actors in our dataset categories of Table 1. Figures 48a, b show the political actors in the Anambra State gubernatorial election conducted on November 18, 2017, and their various scores in frequency and sentiment polarity. The frequency distributions of the tweets in these experiments considered tweets where the political actors are mentioned regardless if there are more than one actor mentioned in the same tweet. For example, Fig. 8b shows a count of tweets that their polarity has been identified. The counts include where the names of the political actors are mentioned irrespective of how many of them appear in a tweet. For example, this tweet ‘Anambra Poll: Election observers, APGA, UPP commend timely distribution of materialsFootnote 13 from our dataset is positive polarity and is counted for both APGA and UPP political actors.

Second PSA Time is considered as a useful dimension of sentiment analysis. To answer research question 1, we used our Twitter dataset grouped according to time of tweet to perform polarity analysis of tweets mentioning each of the political actor. In this phase, we select the top three of the candidates and their parties mentioned in Sect. 3.1.1 to constitute our political actors set. This set is Willie Obiano (the incumbent Governor) of the state ruling All Progressive Grand Alliance (APGA), Tony Nwoye of the national ruling All Progressives Congress (APC), and Oseloka Obaze of the People’s Democratic Party (PDP). The reason for this selection is considered by the number of tweets mentioning political actors and popularity. For each polarized tweet computed using algorithm 1, we find the time it was tweeted, whose name is mentioned ‘solely’ in the tweet at the time and finally compute the average polarity scores of the collection against the political actor’s name mentioned. This is to track how people’s attitude toward a political actor changes overtime during the election. Time arrangement is based on two hourly granularity starting from 06:00 to 23:59 on the election day.

Fig. 9
figure 9

Polarity scores for each political actor based on time scaled by a factor of 100. Time is ranged in two hourly granularity. This graph is used to reveal the attitude of the people toward the political actors in a given time

Figure 9 shows a graph of the polarity scores on each of the political actors scaled by a factor of 100. It reveals what people are feeling about the political actors in a given time window, what topics are being discussed in each of the time, and whose name is mentioned in those topics. Here, we can observe interesting patterns such as between 6–8 and 8–10 there was no tweet specifically mentioning the political actors. But there are general tweets such as

Ndi Anambra, the next 4 years is critical. It will either be more development for the state or statue building leaders. Please vote wisely. #AnambraDecides2017

Fig. 10
figure 10

Most frequent words based on different time of the election. a is for Willie Obiano, APGA, Obiano combined with APGA. b is for Oseloka Obaze, PDP, Obaze combined with PDP. c is for Tony Nwoye, APC and Nwoye combined with APC. Time is categorized into two hourly granularity starting from 06:00 to 23:59. This heatmap is used to discover insights that reveal different most frequent words on each of the political actors across different time during the election fit into topics in Fig. 11. For example, Willie Obiano 12–14 showing is the most frequent keywords associated with Willie Obiano between 12 pm to 2 pm

Fig. 11
figure 11

Top five keywords computed by the LDA model from tweets where political actors are mentioned and the inferring topics from the keywords. The number of words in each keyword is 10. a Is for the political candidates. b Is for the political parties

The data points in Fig. 9, from 8–10 to 20–00, are the sentiment polarity scores of the tweets mentioning the political actors. They are often inferred as positive, neutral, or negative and from the sign of the polarity score, and a tweet is defined as either positive or negative feedback. Thus, they could be used to show the attitude of the public toward the political actors. Generally, from the figure, the tweets mentioning the following political actors {Willie Obiano, APGA, Obaze Oseloka, PDP} are above the zero bar all through the time from 6–8 to 20–00, while {Tony Nwoye, APC} political actors are below the zero bar, signifying negative comments, at 14–16 for {APC, Tony Nwoye} and at 12–14 and 18–20 for Tony Nwoye. Compare Tony Nwoye 14–16 in Fig. 10 and Tony Nwoye at 14–16 data point in Fig. 9.

Figure 10 shows the most frequent words associated with the political actors in a given time. It gives insight to what people are saying about them at that time. Figure 11 shows how important those most frequent words in Fig. 10 are in a given topic. Here, topics are computed by a topic model called LDA. It shows the keywords for each topic and the weightage (importance) of each keyword.

Figure 11 comprises top five different topics per political actor built with an LDA model where each topic is a combination of keywords and each keyword contributes a certain weightage to the topic. Adopting the methodology we explained in Sect. 3.3, we built the LDA model from the corpus and dictionary generated from grouping our tweet data collections into the following political actors, viz., Willie Obiano (the incumbent Governor) of the state ruling All Progressive Grand Alliance (APGA), Tony Nwoye of the national ruling All Progressives Congress (APC), and Oseloka Obaze of the People’s Democratic Party (PDP). This grouping is based on tweets where these political actors are mentioned. Looking at the T0 (topic 0) under Willie Obiano on the figure, it means the top ten keywords that contribute to the topic ‘Voting Hindrances’ are: ‘failing,’ ‘complains,’ ‘card,’ ‘reader,’ ...and the weight of ‘failing’ on T0 is 0.065. The weights reflect how important a keyword is to that topic. Looking at these keywords, we guessed what the topic could be by summarizing it as ‘Voting Hindrances’ associated with the political actor Willie Obiano.

Comparing Figs. 10 and 11, the x-axis and y-axis makers of Fig. 10 show words like ojukwu and bianca that occurred at Willie Obiano and APGA with that of APGA showing a high frequency of occurrence than Willie Obiano. Also, in Fig. 11, these two words are very important, considering their weights, in the formation of topics T2 and T3 of Willie Obiano and APGA, respectively. Ojukwu was a hero in Igbo land and from Anambra State. He was bianca’s husband and the founder of APGA. The same could be said of oseloka obaze and PDP where words such as agulu, jonathan, obi in Fig. 10 constitute to the formation of topics in Fig. 11 (see T2 under both oseloka obaze and PDP). The high-frequency occurrence of the word agulu (also see Fig. 5d), just like ojukwu, reveals the connection between the candidate and one of the PDP’s ‘kingpin’ who hails from Agulu, a town within Anambra state.

Comparing Figs. 9 and 10, the sentiment polarity of oseloka obaze reveals more positive sentiments than the other political actors between 18–20 and 20–00. Words like confident, ran, hard, winning, credible, news, win, says, etc., can be observed at 18–20 and 20–00 from Fig. 10 showing people’s thoughts concerning Oseloka Obaze. Also, from Fig. 10 at 14–16 and 16–18, we observed words such as questionable, going, incidents, spokesman, police, candidate, average, governorship, and inec associated with Tony Nwoye. And words such as decides, law, electoral, breaches, addresses, victory, bianca, coasting, celebrates, and ojukwu are found associated with Willie Obiano at 16–18, 18–20 and 20–00 in Fig. 10. Most of these words formed top ten important words used in the formation of topics in Fig. 11 for the political actors Willie Obiano, Tony Nwoye, Oseloka Obaze, APGA, APC, PDP.

4.2.2 Subjective sentiment analysis (SSA)

An objective sentence expresses some factual information about the world, while a subjective sentence expresses some personal feelings or beliefs. For example, the sentence ‘This past Saturday, I bought a Nokia phone and my girlfriend bought a Motorola phone’ does not express any opinion and hence is objective, while ‘The voice on my phone was not so clear, worse than my previous phone’ sentence is subjective. Subjective expressions come in many forms, for example opinions, allegations, desires, beliefs, suspicions, and speculations (Riloff et al. 2006; Wiebe 2000). Thus, a subjective sentence may not contain an opinion. For example, ‘I wanted a phone with good voice quality’ is subjective but it does not express a positive or negative opinion on any specific phone. Similarly, we should also note that not every objective sentence contains no opinion as in ‘The voice quality of this phone is amazing’ (Liu 2010). The issue of subjectivity has been extensively studied in the literature (Hatzivassiloglou and McKeown 1997; Hatzivassiloglou and Wiebe 2000; Riloff et al. 2006; Wiebe 2000; Riloff and Wiebe 2003; Liu 2010).

Fig. 12
figure 12

Subjectivity sentiment frequency distributions (FD). This shows FD of all the tweets in Table 1 where candidates, political parties, or both are mentioned. We used top five political candidates and parties. a Subjectivity sentiment frequency distribution per candidate. b Subjectivity frequency distribution per political party

First SSA We perform SSA on our Twitter data in Table 1 without time consideration. This is to evaluate the overall Twitter texts tweeted during the Anambra State gubernatorial election conducted on November 18, 2017, whether they are factual or emotional subjective opinions. The evaluation involves all the candidates and their political parties mentioned in Sect. 3.1.1. While sentiment polarity in Sect. 4.2.1 determines the positive or negative connotation of a tweet in our Twitter dataset, SSA tries to discern whether the tweet is subjective in the form of an opinion, belief, emotion, or speculation or objective as a fact. Thus, we investigate tweets mentioning the political candidates, parties, or both to know their subjectivity scores and which one of them is higher. This is illustrated in Fig. 12. We can observe that tweets mentioning willie_obiano and apga is more non-subjective than others. A similar case can be seen in godwin_ezeemo and ppa except in godwin_ezeemo_ppa, tweets mentioning both names, where both subjectivity and non-subjectivity are equal. Oseloka Obaze and his party PDP tweets are more subjective as revealed in all the bar plots. A dissimilar trend is observed in tony_nwoye and his party, APC where the tweets mentioning the candidate’s name are more subjective, the tweets mentioning the party are more non-subjective, and the tweets mentioning both names are more non-subjective. The non-subjectivity of the latter can be viewed as an influence from the party’s non-subjectivity results.

Second SSA In this analysis, we considered time as a useful dimension of sentiment analysis. Along with polarity analysis PSA above, we have also used the same data and time arrangement to answer research question 2. The results for the SSA are shown in Fig. 13.

Fig. 13
figure 13

Subjectivity scores for each political actors based on time. Time is formatted in two hourly granularity from 06:00 to 23:59

We observed from the figure that tweets ‘solely’ mentioning Tony Nwoye are more subjective between 14–16 time. A similar case is observed of Oseloka Obaze but between the 18–20 time group. Willie Obiano and his party, APGA started on subjective scores a bit lower than others but at the end of the day, they got the lowest scores than others showing that tweets mentioning their names are less opinionated. Again, we can say that their results are closely knitted at morning time between 10–12 and at afternoon time between 14–16 and 16–18. Furthermore, it could be envisaged that the personalities of the candidates Oseloka Obaze and Tony Nwoye drove people to be emotionally subjective in their tweets about them leading to the huge differences between their subjectivity scores and that of their parties.

A high subjectivity score does not indicate a higher propensity for a voter to vote. What it does indicate, however, is that Twitter users mentioning Tony Nwoye and Oseloka Obaze in their tweets are more opinionated, and those mentioning Willie Obiano are less opinionated.

5 Discussion

From the above figures and analyses, we deduce that even though a political party serves as a platform that sales the personality of a political actor or contestant while struggling for power, the credibility of a political actor may even though add strength to the spread of the party. For example, Figs. 8,  9, and 10 reveal attitudes of the public toward the political actors. The political actors Oseloka Obaze and Osita Chidioka are well accepted than their political parties in either positive polarity or popularity. The variable oseloka_obaze in Fig. 8a shows a negative polarity score that is almost zero (also see osita_chidioka). Most frequent words such as confident, credible, news, win, says, and peacefully can be observed from Fig. 10b showing people’s thoughts concerning Oseloka Obaze. However, the viability and acceptability of a given political party to the electorates have a greater effect on the victory of the party and the candidate it presents. Individual efforts by the political actors in promoting their political manifestoes through the political party that does not win the sympathy of the electorates are usually of less effect in actualizing political victory, especially in a developing nation like Nigeria. This is illustrated in the case of Willie Obiano and his party All Progressive Grand Alliance (APGA) as explained in the paragraphs below.

Political behavior of the electorates during an election has a connection to the political party that has an ideological link to their belief, culture, and value. For instance, in Nigeria PDP, could rule Nigeria for 16 years (1999–2015) was not because of the power of incumbency rather also as a result of its acceptability to the people notwithstanding who is her flagbearer. Muhammed Buhari contested presidential elections in 2003, 2007 (All Nigerian People’s Party-ANPP), and 2011 (Congress for Progressive Change—CPC) but lost majorly because of the poor spread of his party as a result of unacceptability to the electorates not minding his personality. This resembles the case of the political actor Godwin Ezeemo and political parties PPA and UPP as shown in their sentiment scores and tweets frequencies in Figs. 8a, b and 4. In 2013, Buhari formed an alliance with other political parties (Action Congress of Nigeria- ACN; Congress for Progressive Change- CPC; All Nigerian People’s Party- ANPP; a faction of All Progressive Grand Alliance- APGA and aggrieved members of Peoples Democratic Party- PDP). This alliance gave rise All Progressive Congress (APC) with wider coverage and acceptability in the North-East, North-West, North-Central, and South-West of Nigeria. In view of this, Buhari that lost election three consecutive times (2003, 2007, and 2011) won the 2015 general election against the incumbent president (Goodluck Jonathan of PDP).

Fig. 14
figure 14

Average sentiment polarity for each political party cum candidate. Numbers 1, 2, 3, 4, and 5 represent APGA, PDP, APC, PPA, and UPP, respectively

Empirical observations on the political actor Willie Obiano and his party APGA from Figs. 8a, b and 5 reveal that the political–ideological links such as people’s belief, culture, value, and acceptability add to the Willie Obiano winning the November 18, 2017, Anambra State gubernatorial election. From Fig. 5, the most frequent outstanding words associated with Willie Obiano when combined with APGA can be connected to the ideology of the Igbo nation and Chukwuemeka Odimegwu Ojukwu signifying their normal slogan nke a bụ nke anyị ‘this is our own.’ Moreso, Fig. 11b, topic T4 (political marketing) can further be explained, based on the topic keywords, to be indigenous acceptance ‘Nke a bụ anyị’ (This is our own). Also, from Figs. 8a, b, the sentiment polarity frequency distribution associated with Willie Obiano shows a more negative sentiment compared to his party APGA, while Fig. 14 shows all positive on the average sentiment polarity scores between Willie Obiano and APGA. The variable willie Obiano+APGA at 18–20 and 20–00 in Fig. 10a displays more positive words such as celebrates, victory, results, bianca, coasting, guber, early, and ojukwu than when only willie Obiano is used. APGA since 2008 till date has been winning gubernatorial elections of Anambra State—Nigeria—due to the acceptability of the party to the people and belief on the party as Igbo party. APGA survived the influence of National incumbent parties (PDP and APC) in 2010, 2014, and 2018 gubernatorial elections not because of the qualities of the candidates, rather because of the party’s influence on the people. The founder of the party APGA (Chukwuemeka Odimegwu Ojukwu) is a generally accepted personality among the Igbo nation, especially his state Anambra (see Fig. 5 for the most frequent words with Willie Obiano and APGA). The objective of the party as the founder always advocated is to promote Igbo ideology, to unite the Igbo nation within Nigeria state and to have a political umbrella to advance Igbos interest, made the majority of the Anambrarians especially the masses (electorates) to support the party in every election notwithstanding who is its flagbearer.

In furtherance of our discussion in Fig. 13 about objective and subjective opinions, APGA is a ruling political party in Anambra state for 12 years now. It has been able to design policies and execute infrastructure that outwit other political parties like PDP that was in power for 7 years (1999–2003), APC, UDP, and PPA that have not been in power. This singular opportunity made tweets associated with APGA as a party and its candidate less subjective unlike other political actors. However, other political parties like APC, PDP, PPA UDP, and their candidates have not had such opportunity in their political adventure in the state. In view of this, people’s connection to these parties is not concrete and there is no concrete policies and projects linked to them in the state. This may be the likely reason why the tweets associated with these parties’ candidates are more subjective. See also Fig. 12a for the subjectivity sentiment frequency distribution on the candidates.

Finally, variable like personality influence occurred repeatedly in Fig. 11. This shows that tweets of the people are indicating that personality influence plays significant roles in winning an election. Political party with an acceptable personality coupled with its influence as a party does well than a party who has a person of no or little influence in the society or among the people. Therefore, personality influence and political party influence are very important and underlining factors in electoral victory.

6 Conclusion

In this research, we investigate how candidates/their political parties could influence winning or losing an election using Twitter data. We stated research questions that enable us to evaluate our dataset to gain insight on the political phenomenon that could help us in our research.

We tested our research questions by using Twitter data collected during the Anambra gubernatorial election 2017 as a case study. We analyzed over 7k Twitter messages streamed during the election day only. Since Twitter users tweet in real time, we believe that tweets during the election day are based on what the users are experiencing at that moment and could gain political insight from them. The tweets collected are analyzed exploratively and sentimentally to answer our two research questions. In the explorative experiment, we gained overall insights on our data such as Willie Obiano and his party, APGA recording highest number of tweets mentioning their names. This reveals to us the likely reasons behind him/his party having the highest number of positive and negative tweets in the proportion of frequency distributions of the sentiment ‘polarize and subjective’ tweets (see Figs. 8 and 12). In sentiment analysis, we found people’s attitudes toward the political actors across a given set of time and whether these attitudes are subject to facts or opinions. Our tweets collected were segmented into two-hour time groups forming eight groups starting from 06:00 to 23:59. The primary purpose of this study was to utilize this time-based information, as a useful dimension for sentiment analysis, contained in the message metadata to create a more detailed analysis of Twitter users subjectivity and polarity during the elections. At this stage, a polarized tweet must contain one of the selected candidates/political parties names in order to be considered. For each time group, we filtered tweets that were tweeted between the set time range and grouped them based on the names they are uniquely mentioning. Then, find the average polarity and subjectivity scores of each group within the said time. Generally, the polarity average scores for all selected cases are above bar as shown in Fig. 9 with only Oseloka Obaze exceeding 30% but at 20–00 time, and except Tony Nwoye and APC are the only cases where the scores are negative. Furthermore, it was observed that tweets mentioning Tony Nwoye and Oseloka Obaze were more subjective in nature than those mentioning Willie Obiano and his party. This is likely because APGA has been a ruling political party in Anambra state for 12 years now. We also use LDA model to build topics in Fig. 11 to find various topics being discussed, and we compared the polarity and subjectivity analyses with the findings. Again, we used Fig. 11 to check on how important a word in Fig. 10 is based on its weight.

A high subjectivity score does not indicate a higher propensity for a voter to vote. However, it indicates that Twitter users mentioning Tony Nwoye and Oseloka Obaze in their tweets during the lunch and supper times are more emotional subjective in their messages, whereas those mentioning Willie Obiano at the same time are not.

Finally, the Twitter Analysis and Visualization on #AnambraDecides2017 show that political actors leverage on the impacts of social media (Twitter, Facebook, WhatsApp, Youtube, and other blogs) to define and determine political behavior of the electorates to win elections. It also adds in validating that political insights are a phenomenon present on social media.