Natural hazards have the power to rapidly change the environment of a region for years after an event. The destruction caused by a hurricane, an earthquake, or a tsunami can render some areas uninhabitable and force residents to relocate. Examples abound in recent times. For instance, the Boxing Day tsunami of 2004 in Southeast Asia displaced a large number of residents (Gray et al. 2014), while in the USA, Hurricane Katrina in 2005 also caused many residents of New Orleans and Mississippi residents to flee and never return (Fussell 2015). Increasing threats from sea-level rise and climate change add uncertainty into the hazardous future of many coastal settlements (McGranahan et al. 2007; Curtis and Bergmans 2018), not only increasing the number of people at risk but more significantly those that may be displaced either temporarily or permanently.

Environmental displacement and migration are oft-studied concepts; however, new research increasingly focuses on climate change (Mcleman and Gemenne 2018), and disasters as causal agents. The general study of human migration often encounters difficulties in form of the lack of accessible and reliable data (Willekens et al. 2016; Rango and Vespe 2017), as documenting displaced or migrant populations often elude traditional methods, especially in developing countries, and even more so during post-disaster and emergency situations (Laczko 2015). Despite reported improvements in the availability, quality, and comparability of migration data (Laczko 2015), data concerns continue to constrain migration scholars (Spyratos et al. 2018). Some researchers have called for an increased collection of quantitative data to measure migration flows (Piguet 2010; Bilsborrow and Henry 2012) and for efforts to integrate more diverse, timely, and trustworthy information (United Nations 2014). Big Data provides new possibilities to tackle some of the limitations of traditional methods when tracking population movements. In particular, passive human-sensor data such as geotagged social media hold enormous potential for understanding spatial behavior in disaster situations. The literature demonstrates that Twitter is amenable for addressing some aspects of evacuation behavior (Martín et al. 2017; Kumar and Ukkusuri 2018), but the study of post-disaster population movements is yet to be fully explored.

Responding to calls for innovative data collection methods of population movements, we examine the suitability of Twitter data for the assessment of the disruption of population movements triggered by a disaster. Thus, we leverage Twitter data to explore the impact of Hurricane Maria on Puerto Rico resident out-flows (displacement/migration) and the impact on non-residents inflows (tourism) to Puerto Rico. Therefore, we study two different mobility scenarios, one of increased mobility associated with displacement/migration from the island caused by Hurricane Maria, and one of decreased mobility associated with the tourist flows towards Puerto Rico. For the purposes of this study, we define residents as Twitter users whose main tweeting activity during the year before Hurricane Maria was within Puerto Rico and non-residents as those Twitter users whose tweeting locations were mainly off the island (more about this distinction can be found in the methods section). By individually analyzing the tweet location of 1231 Twitter users, we estimate the total displacement, destinations, timing, and return of displaced Puerto Rico residents. In addition, we test the association of gender, age, region of residence of Twitter users with their displacement behavior. We contextualize our findings and the suitability of Twitter for assessing post-disaster displacement and migration through comparisons with recent studies about these processes in Puerto Rico such as Teralytics (2018), Sutter and Hernandez (2018), Hinojosa et al. (2018), Hinojosa and Meléndez (2018), and United States Census Bureau (2018a). For the analysis of population inflows into Puerto Rico, we compare the amount of non-resident Twitter users active every week during the post-disaster period (September 1, 2017–August 31, 2018) to baseline pre-disaster levels (September 1, 2016–August 31, 2017).

Research context and background

The science: post-disaster population movements

Despite the lack of a thorough and sound conceptualization of disaster recovery (Johnson and Hayashi 2012), disaster scholars recognize the multidimensional (demographic, infrastructural, economic, social, cultural, and psychological) nature of the post-event recovery process (Comerio 2005; Chang 2010). The population dimension of recovery is one essential element of the overall recovery picture, and it has increasingly attracted the interest of researchers, especially demographers, particularly after Hurricane Katrina and the devastation of New Orleans in 2005 (Fussell 2015).

The definition and characterization of population movements are sensitive to spatiotemporal attributes. Population movements associated with either natural or anthropogenic hazards often start with the evacuation process—provided that the threat can be anticipated—and might extend several months or even years after the event, often involving post-disaster migration (Black et al. 2013). This temporal continuum creates confusion in the terminology employed by scholars approaching the field from diverse disciplines and backgrounds (e.g., sociology, geography, economics, or political science). Many have reported a lack of consistency in the use of concepts such as evacuees, displaced population, dislocated population, refugees, and migrants (Piguet et al. 2011; Mitchell et al. 2012). For instance, the term evacuee, often reserved for those who leave in advance of an incoming threat (Lindell et al. 2005), has also been used for those who decide or are forced to leave their homes in the aftermath of a disaster (Elliott and Pais 2006). Variability across the temporal continuum also adds complexity, since an evacuee can transition to a displaced person and a displaced person can become a migrant (Black et al. 2013; Adger et al. 2018).

In addition to departure time (pre-disaster versus post-disaster), other differences include the duration of the movement (ranging from short-term–temporary–to long-term or even permanent), motivation (voluntary or involuntary movement), and spatial dimension (type of boundary crossed) (Fussell 2012; McLeman 2014). Regarding duration, there is no standard temporal definition of what differentiates a temporary migrant (tourist, seasonal worker, displaced person) from a permanent migrant. Several different temporal thresholds have been used, for example, 3, 6, or 12 months (Bell et al. 2015). The distinction of the movement motivation is also fraught with ambiguity. Most research on this issue originates from studies about outmigration regions in developing countries (Oliver-Smith 2009), where armed conflict (Melander et al. 2009), hunger (Baro and Deubel 2006) or disease (Toole 1995) motivates massive population movements. Environmental-induced population movements are more complex to characterize. While some have focused on the effects of migrations on the environment (Bilsborrow 2002), others put the focus on how—or if—the environment produces migrations (Hunter et al. 2015). Among the latter, several studies have attempted to link population movements to processes such as environmental degradation (Piguet 2010) or natural hazards such as drought (Gray and Mueller 2012), tornadoes (Cross 2014), or hurricanes (Fussell et al. 2017). For instance, a recent publication analyzing the impact of hurricanes on migrations into the USA from 30 Central American and Caribbean countries found a 6% increase in movements after the most damaging storms (Spencer and Urquhart 2018). However, others claim that this form of migration to the USA is understudied and in need of more attention (Mitchell et al. 2012).

The debate of hazard-induced population movements includes a well-established distinction between slow-onset environmental changes and rapid-onset environmental events. Slow-onset environmental changes such as drought or sea-level rise typically result in progressive and long-term to permanent migrations (Laczko and Aghazarm 2009; McLeman and Hunter 2010), which some scholars claim is a coping mechanism that reflects adaptation to a changing environment (McLeman and Smit 2006). Rapid-onset events such as hurricanes, earthquakes, or hazardous material spills often produce shorter-term and shorter-distance movements than slow-onset hazards, leaving international migration numbers unaffected (Laczko and Aghazarm 2009; Findlay 2011). Population movements responding to rapid-onset hazards are therefore often temporary, and displaced residents normally return to their communities when their community is restored (Curtis et al. 2015). Return migration is not the only population move in a recovery area (Fussell et al. 2014a). External investments and recovery funds sometimes revitalize the local economy and attract new residents to disaster-stricken places, a process described by Pais and Elliott (2008) as the “The Recovery Machine.” Post-disaster immigration spatially concentrates in the impact zones (re-construction) and in the urban development sector, altering the original demographic composition of the area with an influx of workers and new residents seeking opportunities (Pais and Elliott 2008; Ehrenfeucht and Nelson 2013).

Another variation in post-disaster population flows is the reduction in the number of tourists, an important economic mainstay for many tourism-based economies. Concerned with this issue, studies by Faulkner (2001) and Ritchie (2008) encourage the adoption of proactive mitigation strategies to cope with the reduction of tourists in the aftermath of a disaster. Although most disasters induce an initial decrease in the arrival of national and international tourists, some events attract the curiosity of visitors and can indeed boost the tourism sector; a phenomenon labeled dark tourism or eco-disaster tourism (Gould and Lewis 2007). Indeed, some disaster areas have been converted into tourist points of interest and have become a considerable source of revenue such as post-Katrina New Orleans neighborhoods (Pezzullo 2009) and the National September 11 Memorial and Museum in New York City (Sather-Wagstaff 2016).

No matter how it is discussed or how it affects recovery, one certainty remains, people evacuate during disasters. Some of them return, and others do not, yet we as a society do not have a standard, replicable, and consistent way to trace a disaster diaspora, whether large or small.

Measuring population movements

Traditional data sources

The study of large-scale population flows commonly involves the use of population registers and/or censuses (Fussell et al. 2017) and surveys (Mallick and Vogt 2014) as principal sources of migration data. Registers and censuses involve procedures for systematically collecting and recording information about a given population. Censuses are internationally accepted, and the United Nations recommends standards and methods to assist national statistical authorities in their compilation (United Nations 2008). However, only a few countries regularly record vital statistics and residential change for the whole population international standards for what information is recorded. Thus, registration data is of limited use for migration studies. Nevertheless, where civil registries occur, researchers have an annual record of all migration events with great geographic detail (Bell et al. 2015). The main strength of censuses for the study of human migration is its universal coverage, embracing the whole population of the country. In addition, the census questionnaires include a rich set of demographic and socioeconomic questions that allow the study of multiple factors with direct association with migration processes.

On the downside, there are four broad concerns in using census information for migration studies. First, responsible public authorities often conduct censuses once every decade—as it requires considerable human and economic resources—and such a time lag between collection periods is insufficient for most migration research purposes (Fussell et al. 2014b). Second, censuses are not specifically designed to study aspects of migration (their primary focus is on population stocks rather than flows) and therefore can only include a limited number of questions about this issue. Third, researchers also have concerns about the reliability of the data (Bell et al. 2015). Fourth, undocumented or irregular migrants often are elusive to registries and censuses, which limits the usefulness of these approaches in the migration field (Laczko 2015).

Surveys are typical sources of migration data and can present different designs: cross-sectional surveys, multiple cross-sectional surveys, and longitudinal or panel survey designs (Fussell et al. 2014b). Researchers can model their questionnaires to investigate migration, which is the principal strength of the method. Thus, surveys are able to record detailed migration histories and motivations at comparatively lower expenses than censuses (Bell et al. 2015). Also, surveys are the preferred data collection method to relate environmental events to migration outcomes since the periodicity of censuses is often too sparse (Fussell et al. 2014b). Sampling decisions and resulting biases are the main concern for researchers conducting surveys on migration, which relates to the inability to reach temporary migrants or migrants that already left the study area.

Tourism as a driver of population inflow is a multi-scale phenomenon studied from the tourist attraction itself to broader regional, national, and even global patterns. Depending upon the spatial scale of analysis, researchers use different methods to understand the behavior and patterns of tourism. Surveys and diaries are the main data collection methods used in tourism research, especially at smaller spatial scales (Page and Hall 2014). Accommodation and travel statistics also are conventional data sources for researchers (Page and Hall 2014). However, cross-border and accommodation statistics often are too spatially coarse, temporally sparse, and semantically shallow to explore tourist decision-making (Raun et al. 2016). Also, some countries no longer collect some of these statistics (i.e., European Union member states). Surveying and self-report approaches are resource-demanding, difficult to apply in remote areas, and subject to biases (i.e., sample bias or recall bias) (Shoval and Ahas 2016). Considering this, tourism researchers are increasingly applying innovative data collection methods and sources such as citizen-science and Big Data (Li et al. 2018a; Hu et al. 2018). These new data sources resolve some of the limitations of conventional methods and effectively improve the understanding of tourist behavior.

Non-traditional data sources

As a response to the disadvantages of traditional data sources to study population flows, a dynamic phenomenon as opposed to the study of population stocks, scholars are continuously searching for alternatives. For instance, several scholars leveraged Internal Revenue Service (IRS) data to study county-to-county and inter-county migrations in the USA (Molloy et al. 2011). Fussell et al. (2014a) used these data to measure permanent migrations outcomes from Hurricane Katrina, although they acknowledge the inability of this approach to measure temporary displacements. Others have employed United States Postal Service (USPS) mail recipient and vacancy data collected quarterly at the tract level to gauge population movements to and from disaster areas (Finch et al. 2010). While more timely than census and more comprehensive than surveys, neither IRS nor USPS data provide individual-level information on the timing, origin, and destination of a move. Rather, these data sources offer an aggregate level estimate of mobility before, during, and after an event.

In this pursuit of innovative methods, some authors have turned their focus towards passively user-generated geo-referenced data (Goodchild 2007). Passive human-sensor data originate in the data shadow produced by the digital activity of people, who leave behind traces of information with multiple potential applications (e.g., advertising, research). Increasingly, many researchers exploit the possibilities of these data in a number of fields such as mobility and transportation (Jurdak et al. 2015), public health (Wesolowski et al. 2012), sociology (Amini et al. 2014), or natural hazards (Li et al. 2018b). To a lesser extent, migration studies have also attempted to exploit this source of information (Zagheni et al. 2014). Migration-related scholars are interested in this data source due to its immediacy (close to real-time data), wide coverage, and reduced cost (Spyratos et al. 2018).

Mobile phone call detail records (CDR) hold tremendous potential for migration studies. However, data accessibility is a large limitation as data are rarely shared by private corporations. Taylor (2016) also debates the ethical and privacy concerns related to this type of data, especially for vulnerable populations in areas of poverty, political instability, or crisis. Although researchers and organizations have used CDR data in mobility studies (Alexander et al. 2015; Williams et al. 2015), applications in migration are still scarce as it involves massive and long-term datasets. Despite this, some studies demonstrate the potential of phone call data in the field (Bengtsson et al. 2011; Ahas et al. 2018). For instance, Blumenstock (2012) analyzed a 4-year CDR dataset from 1.5 million Rwandans and revealed patterns of temporary and circular migration hidden to surveys conducted by national organizations.

Other sources of Big Data are more accessible to scholars for research purposes and applications abound. State et al. (2013) leveraged repeated logins to Yahoo! to estimate short and medium-term migration flows. Zagheni and Weber (2012) determined age and gender-specific migration rates using a vast sample of Yahoo! e-mail messages. Compared to other data sources, social media emerges as the richest supplier of data for multiple applications (Stock 2018). Whether exploiting advertising platforms, direct user-generated content (comments, posts, profiles, pictures, etc.), or geo-located information from this user-generated content, social media presents new possibilities for migration research (Laczko 2015). For instance, Zagheni et al. (2017) developed an innovative application of Facebook’s advertising platform to estimate the stock of international migrants in the USA and considered this dataset as a potential continuously updated census. This approach was recently replicated to estimate post-María migration from Puerto Rico (Alexander et al. 2019). Other social media platforms leveraged in migration studies are Google+ (Messias et al. 2016), LinkedIn (State et al. 2014; Barslund and Busse 2016), Skype (Kikas et al. 2015), and Twitter (Hawekla et al. 2014; Zagheni et al. 2014). Zagheni et al. (2014) analyzed geo-located tweets from 500,000 users in a 2-year period and concluded that Twitter can be useful to predict turning points in migration trends and to improve the understanding of the relationships between internal and international migration. Hawelka et al. (2014) examined tweets from 2012 estimating the volume of international travelers by country of residence and identifying spatiotemporal patterns of global mobility.

Social media data for migration research offer both the immediacy and continuous spatiotemporal coverage often lacking in traditional approaches such as surveys and censuses (Spyratos et al. 2018). Considerably large sample sizes and reduced costs are among the most-valued qualities of these data (Zagheni et al. 2018). In a recent report, the European Commission anticipates that Big Data can complement traditional data sources of migration (Hughes et al. 2016). However, scholars must deal with the limitations and weaknesses of these approaches. Selection bias—which relates to any given sample not being representative of the whole population—is one of the main concerns for researchers, and several contributions have tried to reduce its effect (Zagheni and Weber 2015; Yildiz et al. 2017; Jiang et al. 2019). In addition, other researchers have voiced concerns about privacy and ethical issues (Freudiger et al. 2011; Ruths and Pfeffer 2014).

The tourism research field is more prolific in the application of innovative data collection methods than migration studies at present, as the characteristics of tourist flows—short-term movements—are more suitable for these approaches. Thus, the literature is rich in examples of active (individuals are aware of the data generation and its purpose) and passive human-sensor data approaches such as CDR (Raun et al. 2016), GPS data (Grinberger et al. 2014), Bluetooth data (Versichele et al. 2014), and geo-referenced social media (Girardin et al. 2008; Hawelka et al. 2014). Even post-disaster tourism recovery recently benefited from geotagged social media data applied to study the recovery process after both the magnitude 7.2 Bohol earthquake and super typhoon Haiyan in the Philippines (Yan et al. 2017).

Disaster context and study area

The 2017 hurricane season was exceptionally active in the Atlantic basin with 17 named storms including six major hurricanes (NOAA 2017). Three of these major hurricanes—Harvey, Irma, and Maria—made landfall in the USA and are among the top 5 costliest tropical cyclones in the country’s recorded history (NHC 2018). From September 6 to September 7, 2017, Puerto Rico received the first hurricane impact as Category 5 Hurricane Irma tracked about 60 miles north of the island, far enough to avoid hurricane force winds and a significant storm surge. Even though Puerto Rico did not experience a direct hit from Irma, rainfall totaled 10–15 in. in higher elevations in the central area of the island (Cangialosi et al. 2018). Irma caused three indirect deaths in Puerto Rico as well as widespread power outages, loss of water supply, and minor damage to homes and businesses (Cangialosi et al. 2018). Two weeks later, on September 20, while areas of Puerto Rico were still recovering from Hurricane Irma, Hurricane Maria made landfall along the island’s southeast coast as a category 4 storm. Moving northwestwardly, Maria crossed Puerto Rico, leaving a path of complete devastation. The eastern half of the island experienced wind gusts over 200 km/h (Fig. 1a). These wind gusts affected the densely populated areas of San Juan and Carolina (Fig. 1b). The east coast of the island recorded maximum storm surge inundation levels 6 to 9 ft above ground level from the combination of storm surge and the tide (Fig. 1a). The storm surge and wave action caused severe damage to buildings, homes, roads, and harbors along the east, southeast, and northeast coast (Pasch et al. 2018). Central portions of the island received more than 25 in. of rainfall from September 19 to September 21 (Fig. 1c), with some local stations receiving near 38 in. River flooding and mudslides were extensive across many parts of the island and caused additional evacuations and rescues in valleys (Pasch et al. 2018).

Fig. 1
figure 1

Hurricane Maria in Puerto Rico and population forecasts

The effects of Maria in Puerto Rico were catastrophic and triggered a humanitarian crisis that extended several months. The official death toll was considerably underestimated. By December 2017, Puerto Rico’s authorities had only recognized 64 direct or indirect deaths (Santiago et al. 2017). The lack of accessibility to remote areas, power, and communications outages, as well as the difficulty evaluating indirect deaths from worsening of chronic conditions or from deficiencies in medical treatments, caused a delay in issuing death certificates. A recent study by Kishore et al. (2018) increased the death toll by a factor of 70 to an estimated 4645 excess deaths in the aftermath of Hurricane Maria (September 20 to December 31).

Even though Puerto Rico’s situation was significantly aggravated after Hurricane Maria, the island was already experiencing difficulties long before the 2017 hurricane season. Tracing back to its colonial roots, subsistence agriculture was the most common occupation and underdevelopment, illiteracy, and poverty were rampant on the island for centuries. During the twentieth century, under the US rule, the weak Puerto Rican economy began to diversify and flourish based on favorable federal tax laws. Manufacturing and tourism gained prominence as an important share of Puerto Rico’s income. The subsidized economy came to an end in 1996 when President Bill Clinton signed legislation phasing out—over a 10-year period—the favorable tax code that had been active for much of the twentieth century. Employment loss followed after many companies and industries fled the island. The economic model thought to be successful during much of the late twentieth and early twenty-first century had failed in solving the structural problems of poverty, inequality, and dependence (Quiñones-Pérez and Seda-Irizarry 2016). The Puerto Rican government attempted to recapitalize by issuing a large amount of debt bonds on the eve of the global recession of 2008. The island fell into a debt crisis that exacerbated its employment losses.

Given Puerto Rico’s status as a US territory, Puerto Ricans are US citizens and may travel and migrate freely to the rest of the country. With approximately 45% of the population living below the US federal poverty line and the lack of opportunities on the island, migration towards the mainland USA increased significantly (Quiñones-Pérez and Seda-Irizarry 2016). In addition, the birth rate continued a decades-long decline and is now among the lowest worldwide. In just a few decades, Puerto Rico’s population experienced a shift from a young and rapidly growing population to an aging one where deaths now outnumber births. Between 2005 and 2015, Puerto Rico lost around 400,000 residents (Stone 2017). The outmigration of hundreds of thousands of skilled professionals and students cast doubt about Puerto Rico’s immediate future. In this context, demographic experts anticipate further declines (Stone 2017), a conclusion concurring with Cross’s (2014) expectation that declining populations before a disaster are more likely to experience larger post-disaster population losses.

Along with the demographic forecasts (Fig. 1d), health, social, infrastructure stresses, as well as limited government transparency, combine to hinder long-term post-disaster recovery (Government of Puerto Rico 2018a). The Economic and Disaster Recovery Plan for Puerto Rico recognizes that the island will need deep structural and transformative changes and investments to recover from the existing and systemic socioeconomic crisis exacerbated by the extensive damage wrought by Hurricane Maria (Government of Puerto Rico 2018a).

Data and methods

To develop our study, we analyzed over 2.7 billion geotagged tweets comprising a 2-year period from September 1, 2016, to August 31, 2018, on an in-house Big Data computing cluster powered by Hadoop and Impala. The tweets were collected using the Twitter Streaming Application Programming Interface (API) with a bounding box covering the whole world. It needs to be considered that Twitter API allows unrestricted access to only about 1% of the total content and that less than 1% of these tweets are geotagged (Sloan and Morgan 2015). In order to study differences across Puerto Rico, we divided the island into 5 regions—Central, North, West, South, and East (see maps in Fig. 1).

Post-disaster local displacement

In order to explore the spatial response of Puerto Rico residents to Hurricane Maria, we followed a multi-step process to identify active local users in Puerto Rico during the pre-disaster timeframe and track their movements during the post-disaster period (Fig. 2):

  1. 1)

    Identification of active users in Puerto Rico during the pre-disaster period. Between September 1, 2016, and August 31, 2017, 56.5 thousand active users in Puerto Rico sent 2.6 million Tweets.

  2. 2)

    Retrieval of tweets from identified active users in Puerto Rico for the entire world in the pre-disaster period. The 56.5 thousand active users in Puerto Rico sent 6.5 million Tweets from throughout the world in the pre-disaster period.

  3. 3)

    Determination of home location of active users. We assumed those with a majority of tweets originating from Puerto Rico were living on the island (Jurdak et al. 2015). The home location of each active user in Puerto Rico was therefore identified using the median location of each users’ tweets (Martín et al. 2017), resulting in the identification of 32,099 active users whose regular residence during the pre-disaster period was Puerto Rico (Puerto Rico residents).

  4. 4)

    We then retrieved the tweets from the 32,099 residents of Puerto Rico in the entire world during the post-disaster period (September 1, 2017–August 31, 2018). To ensure temporal accuracy to track the users and characterize the movements of the population after Maria, only active Puerto Rico resident users who tweeted at least one time per month in the post-disaster period were selected, reducing the number of users to 1231 after removal of non-human Twitter accounts (sources such as Tweetbot for IS or TweetMyJOBS) or multi-user accounts (tweets in distant locations at the same time).

  5. 5)

    The location of the tweets of the 1231 active Puerto Rico resident users were individually analyzed in the post-disaster period from September 1, 2017, to August 31, 2018.

Fig. 2
figure 2

Workflow to obtain displacement behavior of Puerto Rican residents

With a population of 3.3 million in Puerto Rico in 2017, the number of 1231 is over the ideal sample size (385) for a 95% confidence level with a 5% margin of error. With the purpose of identifying age and gender, we visited the user’s public profile and examined profile pictures, usernames, full name, description, URLs, multimedia content, and tweets uploaded by the users to manually estimate gender (female or male), and approximate age range (17 years or fewer, 18–24, 25–34 years, 35–44 years, 45–54 years, 55–64 years, 65–74 years, and 75 years or older). This process was conducted by one person to avoid differences of estimation/interpretations. The difference in the total number of cases (N) is due to missing data for some of the gender/age descriptors. By individually analyzing the location of tweets from these 1231 users, we collected information about post-disaster population movements: displacement estimates, destinations, timing, and the return. We then tested the association of gender, age, region of residence (central region, north region, west region, south region, east region), and residence in a coastal municipality (coastal or non-coastal) with displacement behavior. To measure the association of gender and age (demographics) and location of residence with the displacement outcome, we conducted bivariate chi-squared tests of independence. We contextualize our findings and the suitability of Twitter for assessing post-disaster displacement and migration by comparison with recent studies about these processes in Puerto Rico such as Teralytics (2018), Sutter and Hernandez (2018), Hinojosa et al. (2018), Hinojosa and Meléndez (2018), and United States Census Bureau (2018a).

Post-disaster population inflows

For the analysis of population inflows into Puerto Rico, we followed a multi-step process to compare baseline pre-disaster non-resident levels in the island with post-disaster levels. This process is detailed below:

  1. 1.

    Identification of active users in Puerto Rico in the pre-disaster and post-disaster periods.

  2. 2.

    Retrieval of tweets from identified active users in Puerto Rico for the entire world in both the pre-disaster and the post-disaster periods.

  3. 3.

    Determination of home location of active users of the two datasets (pre and post disaster)

  4. 4.

    Selection of non-resident users (median center of the tweeting activity outside of Puerto Rico) and filtering to remove non-human Twitter accounts and multi-user accounts

  5. 5.

    Comparison of weekly aggregates of active users during the post-disaster period (September 1, 2017–August 31, 2018) to baseline pre-disaster levels (September 1, 2016–August 31, 2017).

Results and discussion

Resident users’ post-disaster mobility: timing, destination, and characteristics

Figure 3 presents the comparison between the population pyramid of Puerto Rico from the 2017 American Community Survey (ACS) (United States Census Bureau 2018b) and the age and gender distribution of the Twitter sample used in this study. The pyramid shows that those aged 15–24 are overrepresented and those aged 54 and older are underrepresented among active resident Twitter users. The Twitter sample shows asymmetry in the gender distribution, with a higher presence of females in the age segment 15–24 years and lower percentages in older groups. Overall, the sample is composed of 55% of males. This is consistent with studies identifying a general male bias in Twitter samples (Mislove et al. 2011) and with those who find an overrepresentation of females in the youngest cohorts (Leak and Lansley 2018). With a sample size of 1231 Twitter users and using the latest ACS population estimate (2017) of 3,337,177 Puerto Rico residents, we calculate a margin of error for our sample of 3.67% with a confidence level of 99%.

Fig. 3
figure 3

Population pyramid of Puerto Rico and Twitter sample

Our analysis revealed that 36.4% of the identified Twitter users left Puerto Rico within the 15 weeks (until December 31, 2017) after Hurricane Maria made landfall (Table 1). Users who traveled outside of Puerto Rico for 4 weeks or less are considered non-displaced. This group (26.5%) consisted of holiday travelers and residents that sought shelter off the island after the hurricane for a shorter duration while the main lifelines (power, water, and phone service) were reestablished on the island. Following previous literature (State et al. 2013), we considered as displaced those users who stays outside of Puerto Rico lasted more than 4 weeks (8.3%).

Table 1 Travel duration

The majority of the displaced left Puerto Rico in the first half of October (Fig. 4), likely pushed by the extended duration of power outages. In total, 76.3% of the displaced left the island within the first 6 weeks after Maria. This pattern coincides with findings from Hinojosa and Meléndez (2018). The return process was scattered across the following months, with higher rates after holiday periods (Thanksgiving, Christmas) and at the end the academic year (May). Nine months after the disaster (May 31, 2018), only 54.6% of those displaced had returned to Puerto Rico according to our data. When considering the whole Twitter sample, therefore including those who did not abandoned Puerto Rico and those who traveled for 4 weeks or less, 3.8% had relocated outside of Puerto Rico and not returned by May 31, 2018. The most up-to-date estimate at the time of this research, based on data released by the United States Census Bureau, shows the displacement of Puerto Ricans is roughly 129,848 people (3.9% of the population) (US Census Bureau 2018a). This figure closely aligns with our estimates and serves as a relative validation of our approach. However, although our study confirms that Hurricane Maria triggered long-term displacement, whether this long-term displacement becomes permanent migration and confirms the most pessimistic population forecasts is still unknown (Stone 2017). Drawing causal connections to suggest that severe storms cause permanent migration is premature at this point (Spencer 2018).

Fig. 4
figure 4

Timing of departure and return of the displaced

Several other reports estimated the raw number of Puerto Ricans that left the island because of the hurricane (not the percentage). Some of these estimates range from 160,000 (Hinojosa and Meléndez 2018) to 400,000 people (Teralytics 2018). This great variation is explained by different data collection methods and analytical approaches. For instance, Teralytics (2018) did not distinguish between length of stay and included all travels from Puerto Rico to the continental United States, while those studies looking at student population losses reported by Puerto Rico’s Department of Education (Hinojosa and Meléndez 2018) might obscure migrations from older cohorts. For more discussion on different data sources employed in measuring Puerto Rico’s post-Maria exodus, see Hinojosa and Meléndez (2018). An approach based on Twitter such as the one here presented needs also careful understanding of what the results represent in order to produce and interpret conclusions accordingly. For example, in our study, we must note that estimates based on Twitter assume all movements with a duration over 4 weeks are associated with the hurricane, which could mask different travel motivations. Also, the Twitter sample is biased towards a more migration-prone population (e.g., younger population) (Abel and Deitz 2014). Studies regarding the extent of representativeness of the Twitter population in comparison to the overall Puerto Rican population would be needed to quantify all the potential additional biases.

Table 2 compares the destination of the displaced Twitter users with other studies that utilized different collection methods. Although our approach permits county-level destinations, we aggregated the results to the state-level for comparison and validation purposes, as most published studies did not report county-scale information. Florida stands out as the preferred destination for displaced Puerto Rico residents (38.8%), followed by New York (7.8%), Massachusetts (7.8%), and Texas (7.8%). This destination pattern is consistent with other studies using call detail records (CDR) (Teralytics 2018), FEMA change-of-addresses data and FEMA applications for disaster assistance (Hinojosa et al. 2018; Sutter and Hernandez 2018), U.S. Postal Service change-of-address requests (Sutter and Hernandez 2018), and school enrollment (Hinojosa et al. 2018). The concentration of the displaced in Florida, New York, Texas, and Massachusetts confirmed studies suggesting that displacement and migration tend to concentrate in areas where the displaced/migrants already have sociocultural ties (McLeman and Hunter 2010; Findlay 2011; Herdağdelen et al. 2016). Our results, however, must be interpreted with caution as the destinations in our study are based on a very small sample size (103 displaced).

Table 2 Destination of displaced

Table 3 presents the results of the association of displacement with demographic (age and gender) and residential location (region and coastal municipalities) characteristics. We found no association between gender and region and displacement behavior. The coastal municipality variable shows significant association, but this relationship is very weak (phi = 0.083). Age shows a weak (Cramer’s V = 0.148) but significant association. Looking more in depth at the results of the relation between age and displacement, the cohort 25–34 is 1.8 times more likely to relocate than the remaining age groups (Table 4). This result is statistically significant (p < 0.01). The other group with statistically significant results is the 45–54 cohort (p < 0.05), which is less likely to relocate outside of Puerto Rico by a factor of 0.1. The loss of individuals of age groups with higher fertility rates, especially in a pre-disaster context of alarmingly low birth rates, can exacerbate the depopulation in the island (Stone 2017). This would confirm the expectation that declining populations before a disaster are likely to experience larger post-disaster population losses (Cross 2014).

Table 3 Chi-squared tests of independence
Table 4 Displacement rates and odds ratio for different age groups

Non-resident users’ post-disaster mobility: timing and geographical patterns

The 2017 hurricane season not only accelerated the outmigration of Puerto Ricans towards the continental United States, but it also severely damaged the economy of the island. One of the pillars of this economy, the tourism sector, experienced a major hit. A lack of essential utilities such as power or water, closed airports and cruise terminals, beach erosion, and water contamination were some of the reasons behind a decrease in tourist visitations. Figure 5 compares the amount of non-resident Twitter users active in Puerto Rico during the year prior to Hurricane Irma and Hurricane Maria (September 1, 2016–August 31, 2017) with the non-resident Twitter users active during the post-disaster period (September 1, 2018–August 31, 2018). The pre-disaster baseline shows a three-peak pattern: (1) the winter break/holiday period (December 15–January 15) when Puerto Rico attracts tourists due to its warmer climate and beautiful beaches and when many Puerto Ricans residing in the continental United States return home to visit their families; (2) “Spring Break” (February 20 to March 15) when many US college students visit Puerto Rico attracted by the beach and the nightlife of the island; and (3) the summer months (May, June, July) with visitors especially looking for sun-and-beach activities and food and music festivals in town fairs.

Fig. 5
figure 5

3-week moving average of non-residents Twitter users active in Puerto Rico during the periods Sep. 2016–Aug. 2017 (pre-disaster) and Sep. 2017–Aug. 2018 (post-disaster)

Figure 6 shows the difference in the number of non-resident users between a specific week in the pre-disaster and post-disaster periods. The figure shows that September 2017 began with considerably more non-residents users in the island, but the number of non-residents fell below 2016–2017 levels shortly after Hurricane Maria (September 20, 2017) (Fig. 6a). During the first weeks after Hurricane Maria, non-resident user levels stayed close to 2016–2017 totals, likely due to the influx of first responders and relief workers (Government of Puerto Rico 2018b). The holiday season, beginning with Thanksgiving break and continuing into the winter break revealed a decline of around 30% in the number of non-residents in comparison to pre-disaster levels. The decrease during the spring break high-season had a similar magnitude (30%). In general, looking at Fig. 6a, we observe larger decreases in high-season periods (winter break, spring break, and summer) than during the low-season. Particularly noticeable is the reduction (around 50%) of non-resident users in the late summer (July 15–Aug 31) of 2018, which could be related to the negative perception of tourists about the preparedness of Puerto Rico for another active hurricane season in the island (D’Ambrosio 2018).

Fig. 6
figure 6

Percentage difference of non-resident Twitter users active in Puerto Rico during Sep. 1, 2017, to Aug. 31, 2018, from baseline levels (Sep. 1, 2016, to Aug. 31, 2017). Green represents positive changes. Red shows negative changes

The effects of Hurricane Maria on the number of non-resident Twitter users (visitors) were not homogeneously distributed across the island. The hardest hit regions (North and East) (see Fig. 1) experienced the largest decreases, while the Central, West, and South had more contained losses or even positive annual balances. The Central region (Fig. 6b), after a significant increase (over 200%) of non-resident users during September and the first half of October, experienced a slight decrease (5%) until August 2018. The North region received 22% fewer non-resident Twitter users (Fig. 6c) from October 2017 to August 2018, with peaks over 30% decrease during high-season periods and over 60% in the late summer. In the same period, the West region (Fig. 6d) suffered 15% net loss in the total of non-residents Twitter users. However, looking at the intra-annual distribution of this region, we observe two periods where the number of non-residents increased in comparison to baseline levels. First, in the following weeks after Hurricane Maria (from mid-October to mid-December), the totals of non-resident Twitter users increased by 5%. Second, during the spring break high-season period (mid-March), the region experienced a significant increase (11%). The South region (Fig. 6e) is the only region that saw an increase (9%) in non-resident Twitter users from October 2017 to August 2018. Lastly, the East region (Fig. 6f), the most severely affected by Hurricane Maria and the most tourism-oriented region, recorded 33% fewer non-resident users when comparing to the same period before the hurricane. Here, reductions during high-season periods and the late summer were also more significant than during low-season weeks.

The increase in the number of non-resident Twitter users in the least affected regions (particularly relevant in the South) during the first months after Hurricane Maria—from October to January—reveals that the likely cause was the influx of first responders and relief workers. Although high-season periods were more severely affected throughout in Puerto Rico, we observe how the least affected regions suffered smaller losses, which is probably partly explained by the transfer of visitors from more affected areas (e.g., East region towards West region during the 2018 Spring Break).

Conclusions and further research

The study of population movements, especially during and after disasters, continues to be a major endeavor for authorities and researchers. Traditional sources of data for migration and tourism studies are often inadequate for estimating the spatiotemporal dimension of the processes in a post-disaster context, especially concerning data accessibility and reliability. Calls for new data sources and approaches abound (Piguet 2010; Bilsborrow and Henry 2012).

The results presented here confirm the potential for using passive human-sensor data (Twitter) to estimate the magnitude, timing, destination, and return of the displaced, as well as the number of non-residents arriving in Puerto Rico. The findings reveal that the hurricane resulted in 8.3% off-island displacement, with nearly 4% of our Twitter sample (mainly composed of 15- to 54-year-old individuals) leaving Puerto Rico and not returning as of May 31, 2018. In terms of destinations, 62% of those who relocated for more than 4 weeks moved to Florida, New York, Texas, and Massachusetts, and the timing of departure was concentrated within the first 6 weeks after Maria (76% of the displaced). Among the variables tested for association with displacement behavior, only age showed a weak association. The age cohort 25–34 years old had a significant and positive association with displacement, corroborating the scenario of the loss of young professionals at their prime fertile age and casting additional doubt over the demographic future of Puerto Rico. Regarding the study of post-disaster population inflows, we can conclude that, as of August 31, 2018, Puerto Rico had not recovered pre-disaster levels of non-resident visitors. However, the geographic patterns were dissimilar, with the most storm-affected areas (North and East) experiencing larger losses of non-resident visitors than less storm-affected regions (West and South). Research findings are consistent with previous studies such as Teralytics (2018), Sutter and Hernandez (2018), Hinojosa et al. (2018), Hinojosa and Meléndez (2018), and United States Census Bureau (2018a), although additional research is required to better understand the representativeness of Twitter data and how this data relates to census and survey approaches (Spence et al. 2016; Jiang et al. 2019).

Our results indicate that an approach based on geotagged Twitter data is amenable for addressing longstanding problems of data availability and reliability in the displacement/migration and tourism research fields, and in particular for the characterization of the disruption of population fluxes triggered by a disaster. This innovative method based on geotagged social media can complement traditional approaches by providing a rapid and accurate response to questions of magnitude, timing, destination, and demographic characteristics of the displacement and migration processes, as well as to track the reduction in the tourist visitations, which can be of great help to design more targeted surveys and thus reduce the necessary human and economic resources. In addition, future research following this line of work might develop close to real time information that facilitates the creation of early warning systems for forced displacement following disasters, as well as helping in the monitoring of tourism fluxes which are particularly sensitive to external stressors and elusive of current data collection methods.