Abstract
Emerging tools and methodologies are providing insight into the factors that promote the propagation of information in online social networks following significant activities, such as high-profile international social or societal events. This paper presents an extensible approach for analysing how different language communities engage and interact on the social networking platform Twitter via an analysis of the Eurovision Song Contest held in Stockholm, Sweden, in May 2016. By utilising language information from user profiles (N = 1,226,959) and status updates (N = 7,926,746) to identify and categorise communities, our approach is able to categorise these interactions, as well as construct network graphs to provide further insight on these multilingual communities. The results show that multilingualism is positively correlated with activity whilst negatively correlated with posting in the user’s own language.
N. Albishry—This work has been supported by a doctoral research scholarship for Nabeel Albishry from King Abdulaziz University, Kingdom of Saudi Arabia.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Despite the widespread use of Twitter globally – with 328 million monthly active users as of the first quarter of 2017 – little research has investigated the differences amongst users of various languages; there is a tendency to assume that the behaviours of English users generalise to other language users [1]. Language has featured as a facet of research on the geographies of Twitter networks [2, 3], especially whether offline geography still matters in online social networks [4]. Linguistic-inspired studies have been performed on hashtags [5], as well as the volume and proportional of tweets in English and Arabic, as part of an analysis of the Arab Spring [6]. Nevertheless, language is clearly a vital component of affiliation and discourse on the web [7, 8], with the creation and curation of emerging multilingual networks and communities, representing well-established creative and cultural norms, including for minority languages such as Welsh [9], as well as investigations into the economics of linguistic diversity [10].
In the social network analysis domain, centrality measures such as degrees, betweenness, clustering coefficient, modularity and cliques have been used in various projects to measure influence or detect the emergence of new communities [11, 12]. These measures provide the ability to assess network graphs that are constructed from collected data (for example, tweets). Selection of these centrality measures is dependent on the goal of the analysis; for example, the degree of a node helps to identify nodes with high number of connections within the network [13,14,15]. In a representation of a real-world network, this metric may help to identify highly connected persons, such as political leaders, sports stars or celebrities, who are potential “information spreaders” [16,17,18].
Clustering users in communities has been an important factor in social networking analysis, with a particular focus on clustering users based on their locations. However, for the sake of anonymity, many users tend not to disclose information about their identity, such as locations [19]; looking at Twitter, it has also been reported in the literature that geotagged tweets are generally low in number [20,21,22], the exponential growth in social media over the past decade has been joined by the rise of location as a central organising theme [23] of how users engage with online information services and, more importantly, with each other [24,25,26]. The work here examines the correlation between multilingualism of users and their associated activity.
The remainder of this paper is organised as follows: in Sect. 2, we introduce the methodology and key language themes; Sect. 4 presents the 2016 Eurovision Song Context case study, along with an analysis of the key data and results; Sect. 5 concludes the paper with a wider discussion and a summary of the potential application of our approach.
2 Methodology
The primary purpose of this study is to identify and define an extensible analytical approach for examining language uses, communities, and diversity on Twitter. The approach is based on network graphs and their properties, such as indegree, outdegree, and edge weights. Graphs are generated from language settings in users’ profiles and those for statuses. First, we construct user graphs to analyse interactions and multilingualism at the level of individual users. Then, from the user graphs, we produce language communities graph that groups users based on common languages.
3 Language Entities
To generate the required graphs, we need three essential entities from each status; user ID, user profile language, and status languageFootnote 1. Those values can be extracted from [status][‘user’][‘id’], [status][‘user’][‘lang’], and [status][‘lang’], respectively. It is important to note that the focus of this work is on the analytical approach, not necessarily the accuracy of language detection; therefore we assume that language of tweets are correctly identified. For profiles, users are expected to pick a language for their settings. Nevertheless, their language entity may show as the initial placeholder text “Select Language...” or a translated version that may provide information to the user’s native language community.
3.1 Network Graphs
For this study, we need to generate two different graphs; one is based on individual users and their posting activity, while the other combines users into language communities. In the context of this study, all graphs must be directed to provide correct measurements, as demonstrated in Fig. 1.
User Graph. This graph represents the core structure for our analysis. As shown in Fig. 1(a), nodes in this graph are of two types; users and posting language. Each posted tweet resulted in two nodes, one represents the user with profile language setting added to the node as the attribute ‘{profile_lang:xx}’. The other node represents language of the tweet. Edges link users with the posting languages they used, and their weight (thickness) measures the number of tweets that have been posted by the user (the starting node) in the target language (ending node). In the example above, the profile language setting for user ‘03’ is ‘en’, they posted three tweets in ‘en’ and three in ‘ar’ (Arabic). This graph will be referred to as the user graph.
Communities Graph. This second graph is derived from the user graph and has one type of node to represent language community, as shown in Fig. 1(b). For each user node we generate one node from the ‘{profile_lang:xx}’ attribute, and another node from the posting language to which it is connected. This resulted in combining all users of the same profile language into one node, with edge connecting to posting language and its weight measuring their activity. Theoretically, each tweet results in two language nodes, one for the user profile, and the other for language of the tweet. In our example above, users with ‘fr’ (French) profiles have generated six tweets in ‘en’. In the case of ‘ar’ node, we can see that users of the profile language as ‘ar’ have posted four tweets in ‘ar’ only – in graph terminology this is referred to as ‘self-loop’; we will refer to this graph as the communities graph.
Throughout the paper, we refer to language communities in two ways; profile community to perceive the language as user profile settings, whereas posting community refers to the language as tweeting settings.
3.2 Measures
In this section, we will discuss how graph measures can be used to make deductions about users, associated community languages, posting language activity, and how different language communities are linked to each other. These measures and their interpretations, in the context of this study are as follows:
-
Indegree: number of incoming edges;
-
Outdegree: number of outgoing edges;
-
Edge Weight: number of tweets on edge;
-
Weighted indegree: total weights of incoming edges;
-
Weighted outdegree: total weights of outgoing edges.
User Graph Properties. User nodes have indegree = 0, and posting languages have outdegree = 0; these two properties will be used to distinguish between nodes. Both outdegree of user nodes and indegree of posting languages must be greater than 0. The edge weight indicates the number of tweets associated with both end nodes. Referring to the example in Fig. 1(a), we can see that user ‘03’ has indegree of 0 (user identifier), outdegree of 2 (number of languages he used), and weighted outdegree of 6 (total number of tweets posted). Also, in the same figure, we can see that for ‘en’ posting language, it has outdegree of 0 (language nodes identifier), indegree of 3 (number of users posted in this language), and weighted indegree of 9 (total number of tweets posted); Table 1 presents main properties of this graph.
Communities Graph Properties. As discussed in Sect. 3.1, this graph is extracted from the user graph and contains one type of node: language community nodes. Nodes in this graph represent languages as profile language settings, posting language, or both. However, as the graph is directed, we can identify if a community node is for profile or posts by measuring the indegree and outdegree properties. Positive indegree implies posting language, and positive outdegree indicates profile language settings. Figure 1(b) shows three language communities, two nodes appear as posting and profile nodes, while one node exists as a profile only node. The node ‘ar’, for example, has outdegree of 1 and indegree of 2. In other words, at least one user has their profile language settings as ‘ar’, and at least two users have posted in ‘ar’. In terms of edge weights, we can say that there are seven tweets posted in ‘ar’ language, originated from two different profile language communities. For the ‘fr’ node, we can see only outdegree, which means this language community exists as a profile-only node as no user posted in ‘fr’; these measures are summarised in Table 2.
4 Case Study and Discussion
In our case study, we explore the analysis of a dataset collected from the #Eurovision hashtag during the 2016 Eurovision Song Contest, based on the techniques presented in Sect. 2. Using the user graph and communities graph, we conduct analyses on multilingualism, activities and user behaviours in posting in different languagesFootnote 2.
4.1 Case Study: 2016 Eurovision Song Contest
The 2016 Eurovision Song ContestFootnote 3 took place in May in Stockholm, Sweden, with the motto of “Come Together!”. There were 32 countries taking part, with two semi-finals taking place on 12 and 14 May, with 26 countries qualifying for the final on 16 May. This year’s contest was perceived by many commentators to be tense and politically motivated, especially with Ukraine eventually winning the final [27]. Varying analyses see the contest as being influenced by political conflicts, friendships or cultural bias [28,29,30,31], with a range of news articles explicitly discussing the possibly biased results [32]. Twitter activity was very high throughout the event on the primary #Eurovision hashtag, with close to 8 million statuses, produced by nearly 1.25 million users.
The study focuses on original statuses (tweets) as the basic entity, as we wish to measure posting behaviour, not reactions. Preliminary analysis shows that they account for 48% of the total activity, of which 4% tweets with an ‘unidentified’ language were eliminated. As for profiles, all users have chosen language preferences and no profile was found with the default language settings.
4.2 Multilingualism
The outdegree in the user graph shows the number of languages a user used; observing the outdegree of user nodes in the users graph revealed 20 groups of outdegree, ranging from 1 to 25. Figure 2 shows these groups, size of users and activities. Although 85% of users are monolingual, their activity accounts for 47% of all tweets. Additionally, while the average activity of users is five posts per user, monolingual users were the least active ones, scoring an average of two tweets per user. We found that 18% of tweets were in different languages, with a strong correlation between multilingualism and likelihood of using different languages.
We used the user graph to generate two communities graphs; the first will be used to explore language communities amongst monolingual users, while the other includes language communities for multilingual users only.
4.3 Monolingual Communities
This graph includes 63 language communities: 15 languages exist as profile-only and have not been used in any post, while 12 were used in posting but never show as a profile language. Moreover, about 13% of monolingual users used different languages in posts which form 10% of tweets in monolingual communities. Hence, strongest relationships exist as a self-loop, as discussed in Sect. 3.1.
To explore the relationships between language communities, we remove all self-loop edges from the graph. The resultant graph shows that monolingual users with ‘en’ as profile language have posted in 47 other languages, causing 43% of tweeting activity, and that 48 other profile communities used ‘en’ language in posting 43%. Also, we found that the strongest relationship (edge weight), 9% of activity, is when ‘en’ profiles post in ‘es’ (Spanish). A further interesting case to mention involve the ‘el’ (Greek) and ‘ru’ (Russian) languages. Although the number of profile communities that used ‘ru’ is more than twice compared to the number of those that used ‘el’, they were significantly lower in terms of activity.
4.4 Multilingual Communities
Although multilingual users form 15% of all users in the dataset, they generated 53% of tweeting activity. There are 48 language communities in this graph, 13 languages as profile-only, and 10 as posting languages. With self-loop edges excluded, activity in different languages measured 24% of multilingual users tweets. Also, we found that the strongest relations existed between the ‘es’ profile community and the ‘en’ posting language, which is the opposite to the monolingual case.
4.5 Visualisation
In Fig. 3, we present two communities graphs; the size of the node represents weighted indegree of community; how much a language was used in tweeting, and darkness reflects weighted outdegree; participation from users of language community. Edges link between profile and posting communities, and their thickness indicates the number of tweets posted.
Whilst Fig. 3(a) shows all language communities together, Fig. 3(b) presents a filtered graph. This filtered graph depicts relationships amongst language communities that scored high in weighted indegree and outdegree. Also, we eliminated users with activity lower than the overall average (five tweets/user), and generated the communities graph from the remainder.
5 Conclusions
This paper has presented an extensible approach for identifying interactions within language communities using a high-profile real-world case study – the 2016 Eurovision Song Contest – and its associated engagement and interactions on Twitter. This approach utilises network graph properties to explore the behaviour of monolingual and multilingual users. Surprisingly, even though monolingual users formed the largest proportion of users, they were less active than multilingual users. The results also confirmed that higher proportions of user multilingualism implies further distance from their profile language. In the profile community, large number of participants does not necessarily imply high language diversity, as a single post in other language is enough to take the community to a higher level of multilingualism. Therefore, filtering out those users with low activity would improve measurement accuracy. In a few cases, we witnessed users participating in a significant number of languages, up to 25 different languages. Such extreme cases may be interesting to investigate for possible spammer/false account detection or for sociolinguistics in more moderate cases (e.g. 2–5 languages).
The graph measures of users may be useful in confirming their association with language community, without the need to crawl their entire Twitter timeline. Although language settings for user profiles may indicate interface preference only, we found persistent activity in the same language across the users, especially for monolingual users.
A possible scenario for governments, politicians or campaigners would be to use this method to measure to what extent other languages are used within a profile community. It may also show how users associate themselves with one community in their profile while using other languages. Monitoring unusual activity for secondary languages may help to uncover important messages or opinions that could not be openly expressed, for a variety of reasons, to the rest of the profile community. This framework may also be extended to measure reactions via retweeting and replying using a variety of natural language processing and sentiment analysis techniques [33,34,35], to provide a different perspective for influence analysis.
Notes
- 1.
For Twitter, status may also be referred to as post, or tweet.
- 2.
In this context, different language refers to tweet’s language that is different to the user profile language settings.
- 3.
References
Hong, L., Convertino, G., Chi, E.H.: Language matters in Twitter: a large scale study. In: Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (2011)
Takhteyev, Y., Gruzd, A., Wellman, B.: Geography of Twitter networks. Soc. Netw. 34(1), 73–81 (2012)
Magdy, A., Ghanem, T.M., Musleh, M., Mokbel, M.F.: Understanding language diversity in local Twitter communities. In: Proceedings of the 27th ACM Conference on Hypertext and Social Media, pp. 331–332 (2016)
Kulshrestha, J., Kooti, F., Nikravesh, A., Gummadi, K.P.: Geographic dissection of the Twitter network. In: Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (2012)
Cunha, E., Magno, G., Comarela, G., Almeida, V., Gonçalves, M., Benevenuto, F.: Analyzing the dynamic evolution of hashtags on Twitter: a language-based approach. In: Proceedings of the Workshop on Languages in Social Media, pp. 58–65 (2011)
Bruns, A., Highfield, T., Burgess, J.: The arab spring and social media audiences: English and Arabic Twitter users and their networks. Am. Behav. Sci. 57(7), 871–898 (2013)
Zappavigna, M., Martin, J.R.: Discourse of Twitter and Social Media: How We Use Language to Create Affiliation on the Web. Continnuum, New York (2012)
Zhuravleva, A., de Bot, K., Haug Hilton, N.: Using social media to measure language use. J. Multiling. Multicult. Dev. 37(6), 601–614 (2015)
Gruffydd Jones, E., Uribe-Jongbloed, E. (eds.): Social Media and Minority Languages: Convergence and the Creative Industries. Multilingual Matters Ltd., Bristol (2013)
Gisnburgh, V., Weber, S.: How Many Languages Do We Need? The Economics of Linguistic Diversity. Princeton University Press, Princeton (2011)
Willis, A., Fisher, A., Lvov, I.: Mapping networks of influence: tracking Twitter conversations through time and space. Particip. J. Audience Reception Stud. 12(1), 494–530 (2015)
Oatley, G., Crick, T.: Measuring UK crime gangs: a social network problem. Soc. Netw. Anal. Mining 5(1), 1–16 (2015)
Borgatti, S.P., Everett, M.G.: Models of core/periphery structures. Soc. Netw. 21(4), 375–395 (2000)
Rombach, M., Porter, M.A., Fowler, J.H., Mucha, P.J.: Core-Periphery Structure in Networks. SIAM J. Appl. Math. 74(1), 167–190 (2014)
Liu, W., Pellegrini, M., Wang, X.: detecting communities based on network topology. Sci. Rep. 4(5739) (2014). doi:10.1038/srep05739
Cha, M., Benevenuto, F., Haddadi, H., Gummadi, K.: The world of connections and information flow in Twitter. IEEE Trans. Syst. Man Cybern. 42(4), 991–998 (2012)
Borge-Holthoefer, J., Rivero, A., Moreno, Y.: Locating privileged spreaders on an online social network. Phys. Rev. E 85(066123) (2012). doi:10.1103/PhysRevE.85.066123
Zhang, J.X., Chen, D.B., Dong, Q., Zhao, Z.D.: Identifying a set of influential spreaders in complex networks. Sci. Rep. 6(27823) (2016). doi:10.1038/srep27823
Kang, R., Brown, S., Kiesler, S.: Why do people seek anonymity on the internet?: informing policy and design. In: Proceedings SIGCHI Conference on Human Factors in Computing Systems, pp. 2657–2666 (2013)
Morstatter, F., Pfeffer, J., Liu, H., Carley, K.M.: Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s firehose. In: Proceedings of 7th International AAAI Conference on Web and Social Media, pp. 400–408 (2013)
Tan, L., Ponnam, S., Gillham, P., Edwards, B., Johnson, E.: Analyzing the impact of social media on social movements: a computational study on Twitter and the Occupy Wall Street movement. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2013)
Kumar, S., Morstatter, F., Liu, H.: Twitter Data Analytics. Springer, Heidelberg (2014). doi:10.1007/978-1-4614-9372-3
Liang, Y., Caverlee, J., Cheng, Z., Kamath, K.Y.: How big is the crowd?: event and location based population modeling in social media. In: Proceedings of 24th ACM Conference on Hypertext and Social Media, pp. 99–108 (2013)
Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating Twitter users. In: Proceedings of 19th ACM Conference on Information and Knowledge Management, pp. 759–768 (2010)
Blamey, B., Crick, T., Oatley, G.: ‘The first day of summer’: parsing temporal expressions with distributed semantics. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXX, pp. 389–402. Springer, Cham (2013). doi:10.1007/978-3-319-02621-3_29
Caverlee, J., Cheng, Z., Sui, D.Z., Yeswanth Kamath, K.: Towards geo-social intelligence: mining, analyzing, and leveraging geospatial footprints in social media. IEEE Data Eng. Bull. 36(3), 33–41 (2013)
The Telegraph: Eurovision 2016: furious Russia demands boycott of Ukraine over Jamala’s ‘anti-Kremlin’ song. http://www.telegraph.co.uk/news/2016/05/15/eurovision-2016-furious-russia-demands-boycott-of-ukraine-over-j. Accessed 01 Apr 2017
Ginsburgh, V., Noury, A.G.: The eurovision song contest. Is voting political or cultural? Eur. J. Polit. Econ. 24(1), 41–52 (2008)
Charron, N.: Impartiality, friendship-networks and voting behavior: evidence from voting patterns in the Eurovision Song Contest. Soc. Netw. 35(3), 484–497 (2013)
Blangiardo, M., Baio, G.: Evidence of bias in the Eurovision song contest: modelling the votes using Bayesian hierarchical models. J. Appl. Stat. 41(10), 2312–2322 (2014)
Budzinski, O., Pannicke, J.: Culturally biased voting in the Eurovision Song Contest: do national contests differ? J. Cult. Econ. 1–36 (2016). https://springerlink.bibliotecabuap.elogim.com/article/10.1007/s10824-016-9277-6
Kirk, A., Kempster, J., Franco, S.: Eurovision 2016: how does country bias affect the result? http://www.telegraph.co.uk/music/news/eurovision-2016-how-country-bias-affects-the-result. Accessed 31 Apr 2017
Oatley, G., Crick, T.: Changing faces: identifying complex behavioural profiles. In: Tryfonas, T., Askoxylakis, I. (eds.) HAS 2014. LNCS, vol. 8533, pp. 282–293. Springer, Cham (2014). doi:10.1007/978-3-319-07620-1_25
Sluban, B., Smailović, J., Battiston, S.: Mozetic̆ I.: Sentiment leaning of influential communities in social networks. Comput. Soc. Netw. 2(9), 1–21 (2015)
Mostafa, M., Crick, T., Calderon, A.C., Oatley, G.: Incorporating emotion and personality-based analysis in user-centered modelling. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXIII. LNCS, pp. 383–389. Springer, Cham (2016). doi:10.1007/978-3-319-47175-4_29
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Albishry, N., Crick, T., Tryfonas, T. (2017). “Come Together!”: Interactions of Language Networks and Multilingual Communities on Twitter. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-67077-5_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67076-8
Online ISBN: 978-3-319-67077-5
eBook Packages: Computer ScienceComputer Science (R0)