Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

12.1 Introduction

Social media and Big Data have transformed our daily lives into interconnected cyberspace and realspace (Shaw and Yu 2009; Tsou 2015). As more location-aware technologies becoming available, social media platforms have increasingly embraced the location-based dimension (Sui and Goodchild 2011) and GIScience have attracted more interest in the dynamic relations of human behaviors and the environment (Shaw et al. 2016). Geographers can now collect, trace, and visualize the spread of social movements, disease outbreaks, nature hazards, and popular events by digitally collecting social media and Big Data with locational contents (Tsou 2015). This is largely due to the advances in location sensing and information and communication technologies, in particular on a mobile platform. These new technologies enable automatic tracking of human movement and behavior outdoor and indoor at a high level of details in space and time using location-aware technologies (LATs) such as global positioning systems (GPS), cellular networks, WiFi positioning system, Radio-Frequency Identification Device (RFID), surveillance camera, and various kinds of portable smart devices with LATs. Spatially and temporally fine-granular timestamped location data can reconstruct individual trajectories and describe dynamic movement behaviors in detail. In addition, individual-scale data can avoid conventional data scaling problems such as ecological fallacy and modifiable areal unit problems by aggregating data from bottom-up, allowing researchers examining both individual and collective behavior. Furthermore, the pervasiveness of smartphone and internet usage as well as the increasing trend of social media usage accelerate the generation of social media and big data with location information. The dynamic characteristics of social media and Big Data offer geographers research opportunities for examining and modeling human behaviors, communications, and movements (Tsou 2015). This short viewpoint paper reports on a summary of papers presented in a series of special sessions, Human Dynamics in the Mobile Age: Linking Physical and Virtual Spaces, at the Association of American Geographers (AAG) annual meeting in 2015 and Symposium on Human Dynamics Research: Social Media and Big Data at the AAG annual meeting in 2016. The summative report is categorized into three research components in these papers: data, method, and application. In addition, we discuss the current state-of-the-arts in human dynamics research and highlight their key concepts, opportunities, and challenges.

12.2 Human Dynamics Research: Summary of Papers in AAG Special Sessions

Human dynamics is a transdisciplinary research field focusing on the understanding of dynamic patterns, relationships, narratives, changes, and transitions of human activities, behaviors, and communications. The advent of location aware technologies, ubiquitous network infrastructures, and mobile technologies accelerate human dynamics research by providing opportunities for researchers to access to a large amount of fine-granular individual-scale data, which were not available in the past. The availability of such social media and big data is leading to a data-driven scientific inquiry, which is purely inductive and emergent forms of analysis that data to speak for itself (Kitchin 2014; Kwan 2016). To encourage more geographers and GIScientists to study this emerging research themes, a series of special sessions were organized at AAG annual meetings in 2015 and 2016, Human Dynamics in the Mobile Age: Linking Physical and Virtual Spaces (6 sessions) and Symposium on Human Dynamics Research: Social Media and Big Data (3 sessions) respectively, and the total of 42 papers were presented. We analyzed paper titles, abstracts, and keywords and summarized this new research theme by analyzing the data, methods, and applications from the 42 representative research abstracts (Table 12.1).

Table 12.1 Characteristics of papers presented at the AAG annual meetings in 2015 and 2016

12.2.1 Data

Of the 42 papers presented at the AAG human dynamics sessions, just over half (n = 22, 52.4%) used social media data, which was broken down into Facebook (n = 1), Flickr (n = 3), Foursquare (n = 1), Instagram (n = 2), Twitter (n = 16), and Weibo (n = 1). These counts are not mutually exclusive and one paper used data from multiple social media. The social media data can be gathered via Application Programming Interfaces (APIs), which allow users to access to publicly available social media contents, or purchased from social media providers. Types of social media data that researcher can have access vary by social media platforms. These include, for example, media contents such as text messages, photos, and check-ins, tags, timestamps, and locations on media contents, user profile, and user’s social network relationships. As smartphones have become pervasive in everyday life, location attributes are often associated with users’ mobile phone location acquired by GPS, cellular networks, or assisted GPS (A-GPS) supported by cellular networks. Data used in the remaining papers include mobile phone data such as Cell Detail Records (CDRs) and Short Message Service (SMS) (n = 8), camera/video imagery (n = 2), Volunteer Geographic Information (VGI) (n = 2), GPS (Global Positioning System) tracks (n = 1), US Census (n = 1), activity record (n = 1), cadastral record (n = 1), and interview/survey (n = 1).

12.2.2 Methods

Among a variety of methodological approaches were presented, the largest mentioned was GIS (n = 25) as a general framework and a tool to analyze and visualize human dynamics data in conjunction with other analytical methodologies. Under the GIS category, 10 papers mentioned GIS or Web-GIS as a base framework. Spatiotemporal analysis (n = 9) was the second largest GIS method mentioned in papers to study human dynamics in both spatial and temporal dimensions and most papers mentioned it in conjunction with other specific methodologies such as trajectory analysis and text mining. Since social media data often contain text data, text mining and semantic analysis appeared as a popular analytical methodology (n = 9). Specific methodologies applied to text data include, for example, Support Vector Machine (SVM) and Latent Dirichlet Allocation (LDA) for finding text similarities and topics. Papers mentioned data mining and machine learning as general data analytics frameworks (n = 7), while others employed social network/graph-based analysis (n = 2) and trajectory analysis (n = 2). Spatial modeling approaches were presented to describe process and flow of human dynamics and their behavior including geosimulation (n = 1) and spatial interaction (n = 1). Other methods included spatial statistics (n = 1), spatial analysis (n = 2), statistical analysis/modeling (n = 5), visualization (n = 3), literature reviews (n = 4), participant observation/interview as a qualitative method (n = 2), and an overview discussing challenges and opportunities of human dynamics research (n = 1).

12.2.3 Applications

A total of 15 papers applied to study general human dynamics and movement behavior including inter- and intra-urban population flows, tourist’s movement, and human activity space. Studies also utilized social media and big data to examine human mobility, human behavior, and information flow in application to risk assessment and management during disastrous events (n = 5) (e.g., disaster alerts and responses), public health (n = 4) (e.g., infectious disease dynamics, diet behavior), urban dynamics (n = 5) (e.g., gentrification), transportation (n = 3) (e.g., driving and parking behavior), communication (n = 2) (e.g., public perception and information diffusion), and marketing (n = 1) (e.g., cyberspace interaction and consumer behavior). In addition, 5 papers examined human behaviors, communications, and movements and these relationships between cyberspace and realspace.

12.3 The Current State of the Arts in Human Dynamics Research

Papers presented at the AAG sessions covered a broad range of the current state of the arts research topics related to human dynamic research utilizing social media and big data, and related works have been reported in the recently published literature. In terms of data, more disaggregated geo-referenced social media and big data collected via LATs as well as conventional methods (e.g., Census survey) have been utilized to study human dynamics. A few examples are Instagram and Twitter to analyze urban dynamic activity and demographic patterns (Boy and Uitermark 2016; Longley et al. 2015), CDRs to assess the validity of using CDR data for understanding human mobility (Zhao et al. 2016), GPS and accelerometer data to examine physical activity related to built environments (Miller et al. 2015), and the Longitudinal Employer-Household Dynamics (LEHD) data to study disaggregated work trip flows socio-spatial interaction (Niedzielski et al. 2015). In addition, new web and mobile tools have been developed to effectively collect and analyze such social media and big data for human dynamics research (Yang et al. 2016). Furthermore, High Performance Computing (HPC) enables to simulate large-scale human dynamics where millions of agents move and interact in a virtual space under the framework of Agent-Based Modeling (ABM). Such geosimulation frameworks can generate massive microscopic human movement data for exploring and investigating complex streetscape dynamics (Torrens 2016).

Quite a few methodologies have been proposed to conduct research on human mobility, their behavior, and contexts at both disaggregated and aggregated scales. For example, human movement behavior and mobility contexts can be analyzed by examining statistical and geometric properties of human dynamics data (Dodge et al. 2012; Torrens et al. 2012). Space-time analytics can examine reoccurring movements of individuals and from the reoccurring movements to identify patterns of life and their opportunities for interactions based on proximity in space and time (Yuan and Nara 2015). The trajectory-based analysis is used to extract movement characteristics of surgical staff from data collected by an ultrasonic-based location aware system as well as video imagery, which can ultimately describe surgical contexts (Nara et al. 2017). Location-based social network attempts to find human interactions and community structures by creating and analyzing graph networks based on spatial and spatio-temporal constraints under the Time-geography framework (Crooks et al. 2016; Yuan et al. 2014). ABM simulates mobility, decision making process, human-human interaction, and human-environment interaction for modeling complex human dynamics over space and time (An et al. 2014; Heppenstall et al. 2012; Torrens 2015). Text mining and machine learning techniques can be applied to social media and big data to reduce noises and extract meaningful contexts (Allen et al. 2016).

Application examples include public health and epidemiology surveillance (Nagel et al. 2013), criminology (Malleson and Andresen 2015), social movements (Tsou et al. 2013), risk assessment and management for nature hazards and disastrous events (De Longueville et al. 2009; Wang et al. 2016), to name a few.

12.4 Research Opportunities

There are numerous research directions that researchers can take to investigate human dynamics utilizing social media and big data in the coming years. Here we present three examples, location-based social network, location-based linguistic analysis, and dynamic spatial ontology. The first research direction is along the line of the brining the spatial dimension to social network analysis (SNA) and integrating social networks (SNs) into GIS. Social networks are built on the basis of node-edge graph structures where the distance between nodes is the geodesic distance, i.e. the shortest path between two nodes. This distance is known as degree and fits well for modeling the relationships and influences in graphs. However, it ignores the fact that human activities happen at a specific location in physical space and the importance of physical distance is not considered in SNA. By bringing the spatial dimension to SAN, researchers can examine the spatial context and geometry alongside with the graph characteristics (Brockmann and Helbing 2013; Doreian and Conti 2012; Hristova et al. 2016). Nevertheless, there have been few attempts in developing metrics that can combined the existing SNA with spatial analysis to quantify interactions among nodes in the spatial context. In terms of the convergence of SNs and GIS, challenges remain when representing complex multilevel SNs in GIS at the conceptual level (Sui and Goodchild 2011). At the application end, some efforts can be seen in laying out guidelines for modeling various types of SNs in geographic space for understanding human behavior (Yuan and Nara 2015; Andris 2016). One application example is combining SNs and geovisual analytics to representations the spatially embedded SNs from social media. As shown in Fig. 12.1, adding spatial attributes to the social network from Twitter conversations can suggest how location and urban hierarchy might have impacts on how metropolitan areas response to information.

Fig. 12.1
figure 1

An example of a spatial social network between the top 30 U.S. populated Metropolitan Statistical Areas (right) as compared to a regular social network (left). Directed edges between nodes indicate the frequencies of retweeting activities among different SRAs related to the California vaccine exemption conversations

Location-based linguistic analysis is another promising research direction, which utilizes text mining techniques to study human dynamics related to feelings, emotions, and opinions about places extracted from a large amount of textual contents of georeferenced social media and big data. For example, sentiment analysis, a text classification method, can be used to investigate how geographic places correlate with certain textual contents such as the levels of happiness (Mitchell et al. 2013). Topic modeling such as Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Indexing (PLSI) (Aggarwal and Zhai 2012) allows to explore the spatial patterns of themes discovered from geo-referenced text data; for example, the spatial patterns of health behavior topics like “childhood obesity and schools,” “obesity prevention,” and “obesity and food habits” (Ghosh and Guha 2013) and those of common topics on Twitter and their associations with demographic and socio-economic characteristics of Twitter users as well as places and local activities (Lansley and Longley 2016). Text clustering can be applied to group similar unstructured text documents into clusters and allows to investigate spatial clusters associated with built environments and place characteristics. For example, text documents including email correspondence, transcribed face-to-face interviews, and phone calls can provide new and important clues in a criminal investigation (Helbich et al. 2013).

Social media and big data also provide research opportunity to establish dynamic ontology for places (location names) and geographic regions. Traditional spatial ontology is defined by experts or gazetteer dictionaries which are difficult to formalized and standardized. We can define “place” ontology by aggregating hundreds of thousands of geo-tagged social media data (e.g., tweets) mentioned a specific place name (such as “SDSU”) with linguistic analysis (Fig. 12.2). Cartographic visualization methods (e.g., kernel density estimate) can be used to identify the spatial boundary of place names, whereas content analysis can be employed to reveal the meaning of places. These methods can be further applied to observe the temporal changes of the boundaries associated with place names between different seasons. Different from the traditional definition of place names from gazetteers or experts, this new social media and big data-based ontology framework is human-centered and can provide useful information and operational meanings for places.

Fig. 12.2
figure 2

The kernel density estimate hotspot of geotagged tweets containing the “SDSU” keyword (left) and the word could from the SDSU geotagged tweets (right)

12.5 Research Challenges

While social media data and big data provide new research opportunities, there exist notable challenges. Tsou (2015) listed seven research challenges related to mapping social media and big data; (1) lack of demographic profile, (2) data integration problems, (3) issues with user privacy and locational privacy, (4) needs of multidisciplinary collaborations, (5) needs of contextual analysis, (6) filtering noises, and (7) difficulty of the falsifiability of hypotheses and theories. In addition to these challenges, we further identified two key challenges. One challenge relates to the fact that social media platforms/services and the internet of things (IOT) are dynamically evolving over time. APIs of a social media platform will be updated and major revisions on the service and the data access policy can affect data collection and possibly lead a data inconsistency issue. At the time of writing (October, 2016), APIs have been changed, for example, 14 times for Instagram since April 2014, 36 times for Flickr API since November 2010, 68 times for Foursquare API since November 2010, and 113 times for Twitter API since December 2012. Some involve major API changes; for instance, Twitter was originally designed as a text messaging service with a limitation of 160 characters including 20 characters for a user name and 140 characters for a message post. It is now allowing users to post a text message with emoji, images, and videos. These changes not only alter the data structure but likely influence user behavior, which makes human dynamics research using Twitter data more complex. Instagram has also made significant changes to its API in June 2016 that include the deprecation real-time subscriptions for tags, locations and geographies (an equivalent to the Twitter streaming API) and the mandatory requirement of a valid access-token to use APIs in order to fully access Instagram contents (Instagram 2016). To obtain a valid access-token, it requires researchers to develop a live application that has to be reviewed and approved by Instagram. The availability of social media data in a currently accessible data format, therefore, will likely be changed in the near future, which makes researchers especially difficult to conduct a longitudinal study.

Another key challenge is related to data and algorithm uncertainty. In spite of the emerging new research opportunities to produce geographic knowledge by utilizing social media and big data, most of these data are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis (Lazer et al. 2014). Regarding the spatial data quality, location information in most social media and big data can be controlled by end users and it is challenging to know the level of uncertainty by researchers. For example, a location of an Instagram post is selected by a user based on a list of locations provided by Instagram; therefore, a user can easily manipulate his/her location. Furthermore, there exist quite a few web tools and mobile applications to fake location information. While these users’ decision to fake location, or spoof location, protect individual’s geo-privacy, few studies have discussed and incorporated location spoofing in the existing GIScience literature (Zhao and Sui 2017).

Kwan (2016) also questioned that big data-driven research ignores the potentially significant influence of algorithms on research results, and thus geographic knowledge generated with big data might be more of an artifact of the algorithms used than the data itself. For example, Fischer (Fischer 2014) mapped six billion geo-tagged tweets and observed a banding phenomenon, where the original tweet locations tend to align with the closest latitude or longitude, suggesting that tweet locations might have been fuzzed by Twitter through snapping them to the closest latitude or longitude to prevent people’s exact locations being disclosed. Researchers often do not have access to, or even do not know about such algorithms being used by social media providers who generate, process, and provide their data through APIs. Moreover, in order to deal with big data, algorithms are increasingly implemented as computerized procedures, and they become increasingly detached from and less visible to researchers who use them (Kwan 2016). Consequently, such algorithms introduce greater uncertainty and potentially result in significant differences in research findings. Hence, it is crucial to examine and evaluate the validity of data and algorithms in order for maximizing the utility of social media and big data.

Addressing these 9 challenges will be an ongoing endeavor to move forward with human dynamics research utilizing new technologies, social media data, and big geospatial data. In two AAG special sessions, a few papers undertook some of these challenges. For example, two papers integrated more than two data sources; one utilized mobile phone location data, CDRs, and subway smartcard data to uncover dynamic urban population flow patterns, and the other combined Twitter, Flickr, and Instagram data to delineate dynamic place boundaries. One paper applied interview data to explore the practices, potentials, and problems in using data produced through mobile communications for disease disaster management. While majority of papers presented in the AAG sessions focused on the exploratory data analysis revealing interesting patterns related to human dynamics, there is a need for human dynamics research tacking those challenges to critically discuss the use of new forms of data.