1 Introduction

In recent years, there have been a variety of social media platforms that have attracted their share of attention. From Myspace [10] to Facebook [5], to Twitter [15] and Instagram [8] to Snapchat [14], etc., there has been also a consistent requirement for companies to provide more social platforms for their customers to use and interact with each other. These social networks have been credited positively, such as allowing minority voices to be heard, enabling cross-cultural interactions, and simply allowing more people connect with their friends with these recently introduced mechanisms. The enthusiasm about these platforms has generated some controversy, since opposing voices have been allowed to be heard [1]. Complaints about harassment, bigotry and cybercrime are only a few of the problems that these networks have suffered from, leading many to question both the ethical viability of these platforms, as well as the financial stability of even some of the larger networks such as Twitter.

With all of the uncertainties about Twitter and other social networks, the question of how influence is gained inside these networks becomes extremely important. Understanding how particular users gain popularity would allow Twitter and other entities the chance to interact with Twitter audiences far more effectively. Advertisers would be able to target more influential people, entertainers and others who use Twitter as a marketing tool would be able to engage their audience far more effectively, and Twitter itself would be able to limit some of the more undesirable sources of influence on their network. The dynamic distributed structure of social networks is one of the reasons that it appeals to its users. It offers users the unique ability to freely explore the network and connect to people near and far without limits. From the view of online social network analysis, the social environment can be expressed as patterns or regularities in relationships among interacting units. The unit of analysis in online social networks is not the individual but an entity consisting of a collection of individuals and the linkages among them [16]. The concept of a network emphasizes the fact that each individual has ties to other individuals, each of whom in turn is tied to a few or many others. According to [13], social influence is defined as change in an individual’s thoughts, feelings, attitudes, or behaviors that results from interaction with another individual or a group of people. Influence has long been actively studied in marketing, sociology, communication and political sciences.

Seemingly, users with the most followers should have the most influence, as they would have the greatest potential reach. However, we observed that those who had many followers were not necessarily the most active. Their potential influence went unrealized. However, there were other users whose actions succeeded in keeping the conversations going across a group of users. Since Twitter’s presentation is centered around the life cycle of Tweets, rather than the relationships among the users, can relative influence among a group of users be measured?

2 Related Work

One of the well known questions about Twitter is that unlike many traditional sources of media, there are no appointed watchdogs or gatekeepers screening the flow of information in the network and the cost of “publishing” anything on Twitter is incredibly low, therefore, Twitter might invalidate model based and more traditional forms of mass media. There is also a lack of enforced boundaries or groups, like on Facebook, which allows a much wider distribution of information, but also tends to complicate the situation more that ever, when it comes to discussing how groups of people participate in online discussions. These situations are important motivations for defining a rational measurement of influence in Twitter’s network and a variety of measurements that might eventually be used as a scientific method to influence a part of the network.

Measuring influence and social networking potential on Twitter has been discussed in several papers [2, 17]. One approach is measuring influence not only with taking followers and interactions into account, but also by analyzing similarities with the help of a ranking method similar to PageRank [17]. Other approaches define different types of influence on Twitter, such as in-degree and re-tweet [2]. They concluded that each indicator leads to a different ranking of users and that in-degree, i.e. the number of followers a user has, reveals little about the actual influence of a user. Re-tweet influence is strongly content-oriented, whereas a higher degree of influence suggests a high value of the user’s name. The closely related topics of tweeting dynamics and the prediction of content popularity and information distribution have also been discussed in [2].

Some research in [4, 9] suggests that people are more affected by the opinions of their peers than influentials. Recent studies of online social networks [2] support the hypothesis that influentials exert disproportionate amount of influence. With the numbers of active users on these sites numbering in the millions or even tens of millions, identifying influential users among them becomes an important problem with applications in marketing [2]. Twitter has no way for a user to ask another to follow them. Instead, it suggests that users “gain friends and fans on Twitter” by engaging, following others, reading and posting [15]. During that experience, Twitter gathers information to provide for a user additional content and account suggestions [15]. A separate tweet object is generated when a user tweets, which includes the original creator id, but then additional data is captured as the tweet lives on through likes and re-tweets.

3 Our Proposed Approach

Our hypothesis is that someone attempting to be an influencer on Twitter should try to engage their audience, but there is a definite lack of agreement on how this intend should be achieved. It becomes especially difficult to follow such attempt when one considers the recent work such as [7], which indicates that an individual has a distinct limit on how many actual relationships they can form, maxing out around one to two hundred people. There has to be some sort of strategy an influencer can use to gain viewership without attempting to form individual relationships with thousands of individuals on Twitter. Discovering this strategy will be an important factor in revealing the mechanisms behind influence on Twitter. For the purposes of this paper, there will be four measurements of a user’s influence.

A social networking relationship within a smaller “inner” circle, rather than the broader network, would seem to be more influential on social media. Twitter uses an algorithm that pushes content into the user timeline. The first group of tweets in the timeline will be from those the user has chosen to follow. The order of the tweets as they appear in the timeline may correlate to their potential influence, simply because they are more likely to be seen if they appear earlier. Those that appear further down in the Timeline may never been seen at all, as users have many opportunities to click off into other directions as they engage with Tweets. The challenge to identifying influence is to first recognize the inner circle, and then quantify the relative influence of users within it. We sought to find a set of users with a large number of mutual relationships who are all interested in the same professional topic. We identified a subset of users in an online community, with overlap among followers. Next step is to see what usage patterns provide the predictability to sustain a recognizable community, thus creating an inner circle. Then we could isolate the characteristics that correlate to influence. Four primary characteristics for measuring influence emerged from the data:

  • Number of Followers: The number of followers collected by a user increases by both fame and activity level. A person can be well known outside of Twitter, like a politician or an entertainment figure, and their number reflects a built-in notoriety, regardless of regular activity. Users can also accumulate fame within Twitter by regular engagement. As they tweet and are liked, re-tweeted or followed, Twitter increases the amount of times they are suggested as an account to follow for others, often resulting in a larger number of followings.

    A twitter user’s influence can be correlated to their number of followers because of the potential reach of their tweets. The tweets of a user with a high number of followers will go out into the timelines of a larger number of other users. However, if they don’t engage regularly, their twitter influence is unrealized, so this measure alone is not sufficient to measure influence.

  • Number Listed: Twitter users may choose to any of the users they follow to a list, usually created to specify a subset of related users. It is a way of prioritizing users, whose content you value, and want to be able to view regularly.

    Users seem willing to casually follow many users, giving them a high following count. However, they seem much more selective when choosing to add one to a list. If a user is “Listed” by a high number of users, it means his content is expressly valued by those users, which is a indication of potential influence.

  • Ratio of Followers/Followings: The ratio of the number followers (those that follow them) to the number of followings (those they follow) is an indication of the balance of their level of provision and consumption of content. If someone has a significant amount of followers, and relatively few followings, they are likely successfully acting as a provider of content that is being consumed, a sign of influence.

  • Number of Tweets Per Year/Activity Level: For a twitter user to exercise their influence, they need to appear to use Twitter as a continual source of communication, daily or weekly, not just on a promotional basis. Total number of tweets divided by number of years of use gives a number of tweets per year, a very board indication of activity level.

The above characteristics can be combined to compare influence among related users: (i) The number of followers (\(N_F\)), (ii) Ratio of number of followings to number of followers \((\frac{N_{FW}}{N_F})\), (iii) Listed, which is the number of users who specify to see their content in best tweets first (\(N_{listed})\), and (iv) Activity level, which is the average number of tweets per year \((N_T\)). Here \(\alpha \), \(\beta \), \(\gamma \) and \(\eta \) are weight parameters and f is the influence of each twitter user i:

$$\begin{aligned} f_i=\alpha *N_F + \beta * \frac{N_{FW}}{N_F}+\gamma * N_{listed} +\eta *N_T \end{aligned}$$
(1)

By starting with a group of users we knew already had mutual relationships, we observe which usage patterns provide the predictability to sustain a recognizable community, thus creating an inner circle. Then we could isolate the characteristics that correlate to influence.

Table 1. Final Results, where “\(*\)” represents too many to count.

4 Performance Evaluation

In order to collect the data, a tool developed by a Github contributor named Sferik [6] was used. It allowed lists of the users followers, most recent tweets, and other data to be gathered effectively and compared with one another. We ran few scripts in order to generate an accurate picture of how the users could be measured. The data was mostly collected using the whois(), followers(), and last1000() tweets functions. Unfortunately, the followers function did not perform as fast as we expected, due to the extensive amount of parameters passed to the function to perform in sufficient amount of time when targeting the higher levels of the users in the community.

The number of followers could be taken directly from the whois(), as well as the date the user joined Twitter and the total number of tweets. Dividing the number of tweets by the number of years leads to the average activity level for the user. The ratio of followings to followers and the ratio of listed to followers could also be taken from the whois(). The number of followers’ followers was taken by summing the followers column of the results from the followers(). The percentages for original tweets, re-tweets and replies were taken from the last1000tweets() data. Entries that started with “RT” were considered re-tweets, entries starting with “@” were considered replies, and entries that started with neither were considered to be original tweets. As depicted in Table 1, we categorized users into three different groups. The first category were those users who were considered famous, or highly influential. They were characterized by (i) high numbers of followers, (ii) large ratios of followers compared to their followings or their listed followers, and (iii) followers with large numbers of followers. A second category, with more moderate measures of each of these measurements, could be considered active members of the community, but not nearly as influential as the high level members. Finally, the third category consisted of users with few, if any, followers, very low ratios of listed or followings versus their number of followers, and very few followers with large groups of followers themselves. The relative activity of the users were also measured, as far as their number of tweets in the past year, and the breakdown of their various types of tweets: (i) number of original tweets, (ii) number of re-tweets, and (iii) replies were also analyzed by comparing the number of each in the users’ last thousand to two thousand tweets, in order to better understand the kind of effective behavior that would lead these users to become influential within their group.

Several interesting patterns emerged as this data was analyzed. First, activity level appeared to have little effect on the amount of influence a user had. Users who contributed almost half the number of tweets as other more active users still have comparable or more followers, and much higher ratios of followings/listed to followers. Types of tweets appeared to be much more important than the quantity. The percentage of tweets which were actually re-tweets or replies seemed to have a much greater effect on the size of their audience and the other measurements of popularity, which had been previously defined. Another intriguing observation is that many of these Twitter influencers actually spent the majority of their time talking with each other more than their individual users. The grand majority of their audience is made up of followers who act more like observers than participants in the conversation. Many of them never actually interact with the more active and influential members of the group at all, yet the replies and re-tweets of the community’s core members still attract large amounts of attention from the group. In addition, we observed that a successful influencer on Twitter does not attempt to flood the network with original content, or even attempt to engage each individual member of their audience. Instead, they establish mutual relationships with other highly influential individuals, and then have public conversations with those individuals online. The topic doesn’t even necessarily have to relate directly to the purpose of the group. Our initial point of contact with this community talked about their webcomic or writing, but most of their re-tweets and conversations ranged across other topics such as travel, politics, or relating everyday personal experiences. Figure 1 compares the years a user had been on twitter versus the level of activity. Of interest is the fact that users with less years seemed to be less active overall. Figure 2 was generated using data from the last1000tweets(). Since the dates of each tweet are included in the results of this function, we could find the amount of time needed to generate those thousand tweets, giving us a window of time that we could then compare to the percentages of re-tweets, replies and original tweets. Lower percentages appeared to correlate with larger windows of time, or in other words, lower activity levels on twitter. Figure 3 depicts the total number of audiences as the number of followers and the percentages of re-tweets and replies for the past one thousand tweets increased. In Fig. 4 the number of followers is compared to the number of followers with over nine thousand followers of their own. High levels of high influence followers appeared to correlate with large overall audiences.

Fig. 1.
figure 1

Years vs Activity

Fig. 2.
figure 2

Time vs Re-tweets/Replies

Fig. 3.
figure 3

Percentage of Re-tweets/Replies vs Audience

Fig. 4.
figure 4

Audience vs Influence

Our result show the fact that influencers spent disproportionate amounts of time re-tweeting and replying to other high influence individuals. Taking the data from both the followers() function and the last1000tweets() function, the sum of the number of re-tweets and replies that mentioned followers with over nine thousand followers of their own was calculates. The percentage of high influence followers was then compared to how often they were mentioned in re-tweets and replies, showing that they received extremely disproportionate amounts of attention from their fellow influencers. We observed when for an approximate number of samples 11492, only \(1.9\%\) are influencers and for total re-tweets/replies is 66, there are \(22.73\%\) influencers. In order to gain a better understanding of how this mechanism works, we can turn to a popular example in another forms of media: radio talk-shows. People listen to talk-shows without the expectation that the radio hosts will try to engage with them directly; in fact there is likely a very small minority of talk show listeners who bother to call in when given the opportunity. They do, however, expect relatable dialogue, sort of an observational version of the “bikeshedding” rule [11], where people prefer to discuss (or in this case, observe conversations about) subjects with which they are familiar. A similar trend can happen with other digital media such as Youtube videos (particularly videos like Let’s Plays of team based games), and podcasts. Twitter can be used in a similar manner, effectively dividing the Twitter network into three pieces: the group that uses Twitter as a sort of performance platform, the group which watches the performance, and the relatively small group that uses it for actual personal relationships and communication. Just like those other forms of entertainment, then, the group of performers may not define themselves as a community by replying and re-tweeting with each other, and discussing the various subjects they find interesting, but the audience who watches them is not particularly bound by any group boundaries. This aspect of Twitter would explain why communities are so hard to define on the network, since the groups performing for the benefit of the others don’t actually try to draw hard, exclusive boundaries to their audiences. They want to extend the limits as far as possible, and audiences are fluid between the communities they observe.

In other words, an observer could watch any groups they choose, becoming a member of an audience that doesn’t participate in the community any more than someone listening on the radio. Similarly, the performers don’t have to try and form thousands of relationships with their audience; they only have to connect to the other performers who can help them have engaging conversations for the benefit of their audience. Twitter provides two public APIs, a streaming API and a REST API, specified in the Twitter developer documentation to provide programmatic access to Twitter function and data [15]. When logged into Twitter, a user can export his own data, or view it on a Dashboard within the developer area of the application. To do so, each user creates its own “Twitter App”, associated with its unique Twitter ID. That provides the API with user authentication to perform searches. For example, Twitter uses a user’s app to refine all the data that he has access to within the app, and provides that same access outside of the application. The Twitter REST API offers a broad set of capabilities of GET and POST capabilities, which most are functions that users would use as normal usage within Twitter. However, there are some that can be used to extract data to use to characterize interaction within a group of users. Calls to the API are allowed through use of Twitter applications (based on a single user’s authentication) from the command line or the Twitter console. The application has a number of functions that would be useful, although they are constrained by Twitter’s rate limits.

Although Twitter offers open access to their data, they enforce rate limits on calls using the REST API, in order to manage resources, and to protect them from abuse. As an alternative, they offer streaming APIs that provide real-time streaming of Twitter’s data for use in other applications. For our purposes, we needed the REST API to make calls to get historical Twitter data for various users. For each of the user we identified in our community, we could retrieve basic aggregate user information to use to characterize their usage of Twitter. This gave us a starting point for comparing influence across the group of users. The size of the followings of many of the twitter users in our community fell just under the REST API limits, meaning we could also retrieve all of their followers data, up to about 20,000 followers. For a single user, we could also extract data to identify patterns of usage, like frequency and type of tweet. In our effort to try to “predict the next tweet”, we first identified an established community of users with extensive mutual relationships. From them, we can observe and characterize the members patterns of interaction. Using user-related data from the Twitter API, we identified a set of three distinct user types (Expert, Active, Passive) that emerged within the group based on the scale of their followings and their average level of activity. Expert users have \(N_F> 10000\), \(\frac{N_{FW}}{N_F}> 100:1\), \(N_{listed}>1000\), and \(N_T> 1\) per day. Active users have \(1000<N_F<10000\), \(\frac{N_{FW}}{N_F}>10:1\), \(N_{listed}>100\) and \(N_T> 1\) per day. Passive users have \(N_F<100\), \(\frac{N_{FW}}{N_F}<0:1\), \(N_{listed}>0\) and \(N_T< 1\) per day.

The next piece is to characterize the nature of the Tweets themselves, and use that to try to predict where the next tweet will come from. The Twitter API includes a tweet object containing the original message, time and date it was created, and its creator. The data is prefaced with “RT” in the case of a retweet, and an “@Username” for replies. The Tweet object is updated data during its life-cycle to indicate how many times it has been retweeted or favorited. A look at a user’s timeline of recent tweets reveals their rate of activity, as well of the type of tweet. The frequency of their tweets plus the type of tweet tells us how and when they are using Twitter. By looking at the patterns for our user types, we can see how influence is established. Patterns of past activity, plus our measurement of influence gives us information to predict where the next tweets are likely to come from. For our Expert users, their timeline is a collection of tweets, retweets and replies in a short period of time. They engage in constant passing conversation with their counterparts in the community (replies), while being available for immediate response to trending topics (retweets). This sustains their following, so that when they have a professional announcement to make, their audience is already gathered and engaged, and thus more likely to pay attention to it (original tweets). For our Active users, their timeline reveals a slightly different pattern of behavior. They are in the process of building a following, so their activity has more retweets than original tweets or replies. They may be increasing their followings by engaging with trending topics, participating with others who re-tweet messages that catch their attention. Regular patterns of this result in increasing direct engagement with other users available at the same time, who respond to similar messages.

For our Passive users, the timeline has far less activity. Their activity is not daily, but weekly or less. Without the retweet and reply activity, their timeline includes more original tweets, but stretched out over a long period of time. A look at their user data, however, reveals a higher level of favorites, suggesting that they are engaging by reading tweets and favoriting as a means of engagement. Table 2 depicts user data and nature of Tweets on Twitter. After calculating influence for each of the 10,261 users in the set, we assigned the type based on our thresholds, producing the following distribution: 0.68% of users are Expert, 6.21% are Active and 93.11% are Passive. Our thresholds were established to identify patterns of increasing influence through Twitter usage, for professional purposes. As users spend time on Twitter, the natural result is to follow, and be followed by others with similar interest. For users with a more specific purpose, active engagement will result in opportunities for greater influence beyond just a higher number of followers. They are more likely to be added to “Lists”, created by other users to identify users whose content they are most interested in viewing. They are also likely to engage more directly with other users, expressed by a higher average number of Tweets, original, retweets and replies. Twitter’s algorithm prioritizes the tweets into user feeds by previous “engagement and attention” as it is also described in [12]. The combination of higher levels of those characteristics reflects a pattern of usage that results in gathering influence over a larger following over a period of time.

We expected that users with the highest calculated Influence would also be characterized as Expert based on our thresholds. We looked at the top 1% most influential users in our dataset, and found that only half of them were actually Expert users. The top 1% users in our set have each accumulated a very high number of followers (most have 10,000+), resulting in their relatively higher influence calculation. Some have built their following through deliberate active engagement on Twitter. However, some accounts have simply run up a high number of followers by fame. Several accounts that could be bots are noticeable, shedding light on the reality that bots could account for 9–15% of all active Twitter accounts as it is also described in [3]. These accounts may have a high number of followers, but they don’t engage in the style of engagement that meets our thresholds for Expert users.

Table 2. User data and nature of Tweets on Twitter

5 Conclusion and Future Work

Each type of user seems to play a necessary role to form a recognizable community. The Expert members are mutually agreed upon, as evidenced by their large followings, and the large number of followers who have added them to a list. They are most likely to post replies and original tweets, and are unlikely to re-tweet. Active members keep the conversation going, as evidenced by their strong followings. They are more likely than the other types to re-tweet. Passive members, the largest group, participate by liking (Favorite) tweets that they consume, encouraging experts and active members to continue their actions, and sustaining the boundaries of the group. Interaction with a large number of followers or followings might contribute to user’s tendency to engage in behavior they might otherwise avoid. Perhaps there is a number where they begin to feel like they are acting “anonymously”. This could be extended to analyze the ways people commonly use Twitter, comparing narrow uses (like micro-blogging) to broader uses (business and promotion), all inside one application.