Tracking Human Migration from Online Attention

Vaca-Ruiz, Carmen; Quercia, Daniele; Aiello, Luca Maria; Fraternali, Piero

doi:10.1007/978-3-319-04178-0_7

Carmen Vaca-Ruiz^6,7,
Daniele Quercia⁷,
Luca Maria Aiello⁷ &
…
Piero Fraternali⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8313))

Included in the following conference series:

International Workshop on Citizen in Sensor Networks

1099 Accesses
3 Citations

Abstract

The dynamics behind human migrations are very complex. Economists have intensely studied them because of their importance for the global economy. However, tracking migration is costly, and available data tends to be outdated. Online data can be used to extract proxies for migration flows, and these proxies would not be meant to replicate traditional measurements but are meant to complement them. We analyze a random sample of a microblogging service popular in Brazil (more than 13M posts and 22M reposts) and accurately predict the total number of migrants in 35 Brazilian cities. These results are so accurate that they have promising implications in monitoring emerging economies.

Access provided by Autonomous University of Puebla. Download conference paper PDF

How well can we estimate immigration trends using Google data?

Article Open access 08 October 2020

Digital Footprints of International Migration on Twitter

Challenges when identifying migration from geo-located Twitter data

Article Open access 07 January 2021

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

For census agencies, migrations are difficult to track in the developed countries, let alone in developing ones. In emerging economies, authorities rely on inaccurate, outdated, de-contextualized census data even for the local population [15].

Migrants who have left their home country searching for better opportunities rely also on electronic communication to maintain their bonds with their home communities [3]. Publishing and ‘consuming’ content such as news and photos in online platforms is “a parcel of everyday life in transnational families” [2]. Previous studies have found that indicators characterizing offline communities (e.g., economic deprivation) can be extracted from online data (e.g., use of emotion words in Twitter) [22]. Therefore, we propose to consider online data in Brazil and track the number of migrants in a city by considering the interaction between users who live in the city and those outside.

Our main contribution is to propose a set of metrics extracted from online data to estimate migration levels. These metrics reflect the intuition that the higher the number of migrants in a city, the more online interactions between users in the city and those outside it. We compute these metrics for 35 cities in a Yahoo Meme dataset that includes more than 13M posts and 22M reposts exchanged between users in more than 1K cities around the world. We find that the proposed metrics work, in that, they correlate with the number migrants reported by the Brazilian census authority. By then combining these metrics in a linear regression model, we show that the model fits the data extremely well (the $Adj.$ $R^{2}$ = 0.61).

2 Dataset

Yahoo Meme was a microblogging platform, similar to Twitter, with the exception that users can post content of any length or type (text, pictures, audio, video), being text and pictures the more frequently posted content. In addition to posting, users could also follow other users, repost others’ content, and comment on it. In this study, we use a random sample of interactions on Yahoo Meme from its birth in 2009 until the day it was discontinued in 2012 (Table 1). Despite its moderate popularity in USA, Yahoo Meme was popular in Brazil, as witnessed by the fact that the top 45 cities in terms of number of interactions are all located there. Reposting was the main activity in the service (22M sample records) compared to comments (4M). We extract the users who posted the content in our sample and georeference them based on their IP addresses using a Yahoo service. We remove the users for whom we did not obtain results at city level (e.g., users employing proxy servers to connect to the Internet) obtaining 80 K users. For this set of users and their respective posts, we extract all the repost cascades and the follower relationships. Month after month, users across different Brazilian cities tended to intensify their follower connections till reaching a certain stability at month 7 after the platform launch (Fig. 1).

Table 1. Yahoo Meme dataset statistics

Full size table

To attain geographic representability, we ascertain that the number of users in the top Brazilian cities in our dataset is significantly correlated with the number of Internet users (Fig. 2). As a result, any city outside the confidence area calculated (outlier) is excluded from the study. This leaves us with 35 cities, and we will see that such a number grants statistical significant results. That is because we are left with 1.4M repost cascades whose original content was produced in the 35 cities and was consumed across the world.

3 Attention Metrics

It has been shown that migrants maintain their strong ties in their home countries mainly using digital means [2]. We thus expect that studying online interactions in Yahoo! Meme across geographic areas would result in good estimators of migration flows. More specifically, we connect places every time that a user $u_i$ located in city $i$ interacts with a user located in city $j$ either by reposting $u_i$’s content or by following him/her. The volume of such connections is then correlated with migration rates for 35 cities in Brazil. We consider migrants from Brazil itself and from the rest of the world.

Previous studies have shown that interactions on social media cannot be quantified with simple metrics such as popularity or number of followers but they are best characterized with metrics that also reflect the extent to which content is re-shared or liked [1, 6, 23, 30, 32]. That is because social media users make specific decisions about the content they want to consume or who they wish to follow. Such decisions are taken based on offline social ties [31], homophily, and physical distance [25].

We thus resort to attention metrics, and these metrics capture the attention that a city’s users are able to attract from the Rest of the World and from other Brazilian cities:

Cross Border Attention. Our first set of attention metrics for city $i$ is defined as the number of reposts that the city has attracted from the rest of the world ($ROW^{repost}_{i}$) or from other Brazilian cities ($BR^{repost}_{i}$), normalized with respect to the total number $n_i$ of users in that city:

$$ ROW^{repost}_{i} = \frac{out_i}{n_{i}} , BR^{repost}_{i} = \frac{out'_i}{n_{i}} $$

where $out_i$ is the number of times a post originated in city $i$ has been reposted outside it (the world excluding Brazil); $out'_{i}$, instead, counts the reposts received outside the city but inside Brazil.

We repeat the same definition considering now the number of cross-borders followers attracted by users in city $i$:

$$ ROW^{followers}_{i} = \frac{outf_i}{n_{i}} , BR^{followers}_{i} = \frac{outf'_i}{n_{i}} $$

where $outf_i$ is the number of times a user in city $i$ has been followed by a user outside it (the world excluding Brazil); $outf'_{i}$, instead, counts the follower links outside the city but inside Brazil. As a result, we obtain the first four metrics.

Authority. The previous metrics consider all cities equally. However, certain cities might be more central to migration flows than others. To capture this concept of centrality, we built an attention graph using reposts. This is a weighted directed graph where nodes are cities, and directed weighted edges $(i,j,w)$ represent the volume $w$ of reposts between city $j$ where the reposter lives, and city $i$ where the original poster lives. Self-edges are allowed as many reposts occur between users living in the same city. The resulting attention graph has 1,310 nodes and 25 K weighted edges (Fig. 3). Then, we measure the ‘authority’ index of each city using the HITS algorithm [14]. In the HITS algorithm the autorithy centrality of a vertex is defined to be proportional to the aggregated values of the hub centrality indexes that point to it. For a city $i$, the two indexes as defined as follows:

$$ Authority_{i} = \alpha \cdot \sum \limits _{j \in C} A_{ij}Hub_{j},$$

$$ Hub_{i} = \beta \cdot \sum \limits _{j \in C} A_{ji} Authority_{i}, $$

where $\alpha $ and $\beta $ are constants, C is the set of cities in our dataset and $A$ is the attention graph’s corresponding city adjacency matrix.

The Authority index calculated by the HITS algorithm is more informative for the vertex centrality in directed networks than simpler measures such as the number of incident edges or indegree centrality [12] and, thus, it better captures the importance of a node in the network.

We calculate the correlation among each pair of the five metrics: $ROW^{repost}_{i}$, $BR^{repost}_{i}$, $ROW^{followers}_{i}$, $BR^{followers}_{i}$, $Authority_{i}$ (Fig. 4) and observe that they are all correlated with each other. That is why, when we will run our predictions, we will account for interaction effects.

4 Correlations Between Attention and Migration

From the 2010 data provided by the Brazilian census bureau^{Footnote 1}, we compute two migration rates for each of the 35 cities: $m_{ROW}$ is the number of people coming from other countries and $m_{BR}$ is that from other Brazilian cities. Both values are normalized by city population. We then correlate these two migration rates with our five attention metrics. To account for skewness, the metrics are log-transformed. The results obtained are statistically significant, with at least $p$-value $<0.05$.

Reposts and Follower metrics. We find positive correlations between migration rates and attention received by the rest of the world: $r=0.28$ for $ attention $ computed on reposts, and $r=0.33$ for attention computed on number of followers. Stronger correlations are also found for attention received from other Brazilian cities: $r=0.33$ for attention computed on reposts, and $r=0.46$ for attention computed on number of followers.
Authority metric. Since the authority measure can be only computed on the aggregate (Brazil plus rest-of-the-world) dataset, we should correlate the authority measure with the total number of migrants ($m_{ROW}$ + $m_{BR}$). In so doing, we obtain, again, a positive correlation $r=0.32$.

5 Predicting Migration from Attention

We model the number of migrants as a linear combination of the five attention metrics. This is what we call Model1:

$$\begin{aligned} \begin{array}{r} log(MigrantsNumber_i) = \alpha + \beta _{1} \cdot log(ROW^{repost}_{i}) + \\ \beta _{2} \cdot log(ROW^{followers}_{i}) + \beta _{3} \cdot log(BR^{repost}_{i}) + \\ \beta _{4} \cdot log(BR^{followers}_{i}) + \beta _{5} \cdot log(Authority_{i}) + \\ \epsilon _{i} \end{array} \end{aligned}$$

(1)

We also build a model to account for the pairwise interactions effects between indicators:

$$\begin{aligned} \begin{array}{r} log(MigrantsNumber_i) = \alpha + \beta _{1} \cdot log(ROW^{repost}_{i}) + \\ \beta _{2} \cdot log(ROW^{followers}_{i}) + \beta _{3} \cdot log(BR^{repost}_{i}) + \\ \beta _{4} \cdot log(BR^{followers}_{i}) + \beta _{5} \cdot log(Authority_{i}) + \\ \gamma _{m} \cdot Interactions_{im} + \epsilon _{i} \end{array} \end{aligned}$$

(2)

where $Interactions_{im}$ accounts for the pairwise interactions among the five attention metrics. This is model 2 (Table 2).

To account for Internet penetration rates and population, we build a model adding these two census variables

$$\begin{aligned} \begin{array}{r} log(MigrantsNumber_i) = \alpha + \beta _{1} \cdot log(ROW^{repost}_{i}) + \\ \beta _{2} \cdot log(ROW^{followers}_{i}) + \beta _{3} \cdot log(BR^{repost}_{i}) + \\ \beta _{4} \cdot log(BR^{followers}_{i}) + \beta _{5} \cdot log(Authority_{i}) + \\ + \mu _{i} Internet_i + \rho _{i} Population_i + \\ + \gamma _{m} Interactions_{im} + \epsilon _{i} \end{array} \end{aligned}$$

(3)

where $Internet_i$ is the city’s Internet’s penetration rate, $Population_i$ is the city’s population, and $\epsilon _i$ is the error term. This is Model 3. We control for Internetpenetration because it is associated with online activity, and for city size because larger cities tend to be economically prosperous and enjoy “increasing returns to scale”: a city becomes more attractive as it grows [12].

Table 2. $Adj.$ $R^2$ for different models predicting city $i$’s number of migrants. Model 1’s predictors are the five attention metrics $Attention_{im}$, Model 2 adds their interaction effects, Model 3 controls for the city’s Internet penetration rates and population. All $p$-$values$ are $<0.001$.

Full size table

By computing the beta coefficients of model 2, the one with the best performance (without census data), we find that cross border attention in terms of followers accounts for 22 % of the model’s explanatory power, while the cross border attention for reposts explains 18 %. $Authority$ attention, instead, only explains 7 % of the variance. As for model 2’s accuracy, the model achieves a Mean Absolute Error (MAE) of 0.21 on a logarithmic scale, where the minimum value is 2.6 and maximum is 5.23, meaning that, on average, the model predicts the log of the number of migrants within 1.16 % of its true value. Figure 5 plots the values predicted by model 2 against actual ones. Rio de Janeiro, one of the most international Brazilian cities, is one outlier for which the number of migrants level is higher than the predicted value.

6 Related Work

Real-life Processes and Social Media. Email exchanges have been used to track migration flows among developed and developing countries [26]. Also, Quercia et al. have shown a correlation between the sentiment expressed in tweets originated by residents of London neighborhoods and the neighborhoods’ well-being [22].

In the last few years, there have appeared some initiatives for measuring socio-economic conditions of city residents in developing countries using online data. For example, the United Nations and the World Bank have recently launched a program called “Data4Good”. This promotes the use of (currently untapped) digital data for, say, improving poverty measurement (“How can we measure poverty more often and more accurately?”) or dealing with corruption in international investment projects (“Can we detect fraud by looking at aid data?”). Recently, Orange released an anonymized dataset of mobile phone calls in Côte d’Ivoire, and launched a challenge in which researchers had to predict economic indicators from the activity metrics extracted from the call records [17]. Our research complements this line of work by proposing a set of metrics that can be applied to data extracted from any data source that reflects social exchanges, including social media data.

Migration. Davis et al. [8] conducted a study of human mobility using data published by the World Bank. They built a network of countries based on migration flows, and found that the most well connected countries remain stable over time and that migration is directed towards low and mid degree countries.

7 Conclusion

We have shown that online metrics are effective at predicting number of migrants. These metrics are particularly useful in developing countries, where economic changes happen at fast pace. As part of future work, we will study socio-economic indicators other than migration rates, and we will start with GDP and social capital.

Notes

1.
http://www.ibge.gov.br

References

Asur, S., Huberman, B.A., Szabo, G., Wang, C.: Trends in social media: persistence and decay. In: Proceedings of the 5th AAAI Conference on Weblogs and Social Media (ICWSM) (2011)
Google Scholar
Baerenholdt, J.O., Granås, B.: Mobility and Place: Enacting Northern European Peripheries. Ashgate Publishing Ltd., Hardcover (2008)
Google Scholar
Bates, J., Komito, L.: Migration, community and social media. Transnationalism in the Global City, vol. 6. University of Deusto, Bilbao (2012)
Google Scholar
Boucher, G., Grindsted, A., Vicente, T.L. (eds.): Transnationalism in the Global City. Universidad de Deusto, Bilbao (2012)
Google Scholar
Brodersen, A., Scellato, S., Wattenhofer, M.: Youtube around the world: geographic popularity of videos. In: Proceedings of the 21st ACM Conference on World Wide Web (WWW) (2012)
Google Scholar
Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.: Measuring user influence in twitter: the million follower fallacy. In: Proceedings of the 4th AAAI Conference on Weblogs and Social Media (ICWSM) (2010)
Google Scholar
Datta, A.: Human Migration: A Social Phenomenon. Mittal Publications, New Delhi (2003)
Google Scholar
Davis, K.F., D’Odorico, P., Laio, F., Ridolfi, L.: A complex network perspective. PloS One 8(1), e53723 (2013)
Article Google Scholar
Eagle, N., Macy, M., Claxton, R.: Network diversity and economic development. Science 328(5981), 1029–1031 (2010)
Article MATH MathSciNet Google Scholar
Favell, A., Feldblum, M., Smith, M.P.: The human face of global obility: a research agenda. Society 44(2), 15–25 (2007)
Article Google Scholar
Ghosh, R., Lerman, K.: Predicting influential users in online social networks. In: Proceedings of the 4th AAAI Conference on Weblogs and Social Media (ICWSM) (2010)
Google Scholar
Glaeser, E.L., Kohlhase, J.E.: Cities, regions and the decline of transport costs. Reg., Sci. 83(1), 197–228 (2004)
Google Scholar
Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A.: The predictive power of online chatter. In: Proceedings of the Eleventh ACM Conference on Knowledge Discovery in Data Mining (KDD) (2005)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Landau, L., Segatti, A.: Contemporary migration to South Africa: a regional development issue. World Bank-free PDF (2011)
Google Scholar
Lerman, K., Jain, P., Ghosh, R., Kang, J.-H., Kumaraguru, P.: Limited attention and centrality in social networks. In: Proceedings of Conference on Social Intelligence and Technology (SOCIETY) (2013)
Google Scholar
Mao, H., Shuai, X., Ahn, Y.-Y., Bollen, J.: Mobile communications reveal the regional economy in côte divoire. In: Proceedings of the 3rd Conference on the Analysis of Mobile Phone Datasets (NetMob) (2013)
Google Scholar
Mejova, Y., Srinivasan, P., Boynton, B.: GOP primary season on twitter: popular political sentiment in social media. In: Proceedings of the Sixth ACM Conference on Web Search and Data Mining (WSDM) (2013)
Google Scholar
Naaman, M., Becker, H., Gravano, L.: Hip and trendy: characterizing emerging trends on twitter. J. Am. Soc. Inform. Sci. Technol. 62(5), 902–918 (2011)
Article Google Scholar
Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Bad news travels fast: a content-based analysis of interestingness on twitter. In: Proceedings of the Web of Science Conferece (2011)
Google Scholar
O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the 4th AAAI Conference on Weblogs and Social Media (ICWSM) (2010)
Google Scholar
Quercia, D., Ellis, J., Capra, L., Crowcroft, J.: Tracking gross community happiness from tweets. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW) (2012)
Google Scholar
Romero, D.M., Galuba, W., Asur, S., Huberman, B.A.: Influence and passivity in social media. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 18–33. Springer, Heidelberg (2011)
Google Scholar
Ruiz, E.J., Hristidis, V., Castillo, C., Gionis, A., Jaimes, A.: Correlating financial time series with micro-blogging activity. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM) (2012)
Google Scholar
Scellato, S., Mascolo, C., Musolesi, M., Latora, V.: Distance matters: geo-social metrics for online social networks. In: Proceedings of the 3rd Conference on Online Social Networks (WOSN) (2010)
Google Scholar
State, W.I., Bogdan, E.Z., et al.: Studying inter-national mobility through ip geolocation. In: Proceedings of the Sixth ACM Conference on Web Search and Data Mining (WSDM) (2013)
Google Scholar
Taylor, P.J., Ni, P., Derudder, B., Hoyler, M., Huang, J., Lu, F., Pain, K., Witlox, F., Yang, X., Bassens, D., et al.: Measuring the world city network: new developments and results. GaWC Res. Bull. 300 (2009)
Google Scholar
Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with twitter: what 140 characters reveal about political sentiment. In: Proceedings of the 4th AAAI Conference on Weblogs and Social Media (ICWSM) (2010)
Google Scholar
UN. Big Data for Development: A Primer. United Nations, Global Pulse (2013)
Google Scholar
Ver Steeg, G., Galstyan, A.: Information transfer in social media. In: Proceedings of the 21st ACM Conference on World Wide Web (WWW) (2012)
Google Scholar
Wellman, B., Haase, A., Witte, J., Hampton, K.: Does the Internet Increase, Decrease, or Supplement Social Capital? Social Networks, Participation, and Community Commitment (2001)
Google Scholar
Weng, J., Lim, E., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM Conference on Web Search and Data Mining (WSDM) (2010)
Google Scholar
Weng, L., Flammini, A., Vespignani, A., Menczer, F.: Competition among memes in a world with limited attention. Sci. Rep. 2(335), 1–8 (2012)
Google Scholar

Download references

Acknowledgments

Carmen Vaca Ruiz’s research work has been funded by SENESCYT and ESPOL, Ecuador.

Author information

Authors and Affiliations

Politecnico di Milano, Milan, Italy
Carmen Vaca-Ruiz & Piero Fraternali
Yahoo Research, Barcelona, Spain
Carmen Vaca-Ruiz, Daniele Quercia & Luca Maria Aiello

Authors

Carmen Vaca-Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Quercia
View author publications
You can also search for this author in PubMed Google Scholar
Luca Maria Aiello
View author publications
You can also search for this author in PubMed Google Scholar
Piero Fraternali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carmen Vaca-Ruiz .

Editor information

Editors and Affiliations

Universitat Politècnica de Catalunya Dept. Arquitectura de Computadors, Barcelona, Spain
Jordi Nin
Barcelona Digital Technology Centre, Barcelona, Spain
Daniel Villatoro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vaca-Ruiz, C., Quercia, D., Aiello, L.M., Fraternali, P. (2014). Tracking Human Migration from Online Attention. In: Nin, J., Villatoro, D. (eds) Citizen in Sensor Networks. CitiSens 2013. Lecture Notes in Computer Science(), vol 8313. Springer, Cham. https://doi.org/10.1007/978-3-319-04178-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-04178-0_7
Published: 20 December 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04177-3
Online ISBN: 978-3-319-04178-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tracking Human Migration from Online Attention

Abstract

Similar content being viewed by others

How well can we estimate immigration trends using Google data?

Digital Footprints of International Migration on Twitter

Challenges when identifying migration from geo-located Twitter data

Keywords

1 Introduction

2 Dataset

3 Attention Metrics

4 Correlations Between Attention and Migration

5 Predicting Migration from Attention

6 Related Work

7 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Tracking Human Migration from Online Attention

Abstract

Similar content being viewed by others

How well can we estimate immigration trends using Google data?

Digital Footprints of International Migration on Twitter

Challenges when identifying migration from geo-located Twitter data

Keywords

1 Introduction

2 Dataset

3 Attention Metrics

4 Correlations Between Attention and Migration

5 Predicting Migration from Attention

6 Related Work

7 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation