Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Mobility has been a steering force for much of human history. The movement of peoples has determined the dynamics of numerous social and biological processes from tribal mixing and population genetics to the creation of nation-states and the very definition of our living areas and identities. Urban and transportation planners, for example, have long been interested in the flow of vehicles, pedestrians, or goods from place to place.

With more than half of the world’s population is now living in urban areas,Footnote 1 understanding how these systems work and how we can improve the lives of people using them is more important than ever. Insights from models informed by novel data sources can identify critical points in road infrastructure, optimize public services such as busses or subways, or study how urban form influences its function. Epidemiologists are also relying heavily on models of human movement to predict and prevent disease outbreaks [13, 66] as global air travel makes it possible for viruses to quickly jump continents and dense urban spaces facilitate human-to-human contagion. This has made understanding human movement a crucial part of controlling recent disease outbreaks.Footnote 2 Finally, social scientists are increasingly interested in understanding how mobility impacts a number of social processes such as how information spreads from person to person in offices and cafes across the world. These interactions have been theorized to impacts crime rates, social mobility, and economic growth [6, 46] and understanding their dynamics may improve how we live, work, and play.

The growing need to understand and model human mobility has driven a large body of research seeking to answer basic questions. However, the lack of reliable and accessible data sources of individual mobility has greatly slowed progress testing and verifying these theories and models. Data on human mobility has thus far been collected through pen and paper surveys that are prohibitively expensive to administer and are plagued by small and potentially biased sample sizes. Digital surveys, though more convenient still require active participation and often rely on self-reporting [14]. Despite the development of statistical methods to carefully treat this data [5, 26, 45] new, cheaper, and larger data sources are needed to push our understanding of human mobility efforts further.

The evolution of technology over the past decade has given rise to ubiquitous mobile computing, a revolution that allows billions of individuals to access people, goods, and services through ‘smart’ devices such as cellular phones. The penetration of these devices is astounding. The six billion mobile phones currently in use triples the number of internet users and boast penetration rates above 100% in the developed word, e.g. 104% in the United States and 128% in Europe.Footnote 3 Even in developing countries, penetration rates are of 89%Footnote 4 and growing fast. These devices and the applications that run on them passively record the actions of their users including social behavior and information on locationFootnote 5 with high spatial and temporal resolution. Cellular antennas, wifi access points, and GPS receivers are used to measure the geographic position of users to within a few hundred meters or less. While the collection, storage, and analysis of this data presents very real and important privacy concerns [15, 16], it also offers an unprecedented opportunity for researchers to quantify human behavior at large-scale. With billions of data points captured on millions of users each day, new research into computational social science [37] has begun to augment and sometimes replace sparse, traditional data sources, helping to answer old questions and raise new.

In this chapter, we present an overview of mobility research in the current data rich environment. We describe a variety of new data sources and detail the new models and analytic techniques they have inspired. We start by exploring research on individuals that emphasizes important intrinsic and universal characteristics about our movement: we are slow to explore, we are relatively predictable, and we are mostly unique. We then discuss efforts to add context and semantic meaning to these movements. Finally, we review research that models aggregates of human movements such as the flow of people from place to place. Throughout and at the end of this chapter, we point out applications of this research to areas such as congestion management, economic growth, or the spreading of both information and disease.

2 New Data Sources

Traditional data sources for human mobility range from census estimates of daily commutes to travel diaries filled out by individuals. These surveys are generally expensive to administer and participate in as they require intensive manual data encoding. To extract high-resolution data, individuals are often asked to recall large amounts of information on when, where, and how they have traveled making them prone to mistakes and biases. These challenges make it hard for surveys to cover more than a day or week at a time or to include more than a small portion of the population (typically less than 1%).

Mobile phones, however, with their high penetration rates, represent a fantastic sensor for human behavior. A large fraction of location data from mobile phones are currently in the form of call detail records (CDRs) collected by carriers when users perform actions on their devices that make use of the telecommunications network. The location of each device at the time a call, text, or data request is registered (Fig. 2.1) is recorded by carriers for billing, network performance, and legal purposes. Locations are inferred either by observing the tower through which the phone is connected or by triangulation with nearby towers. With the increasing use of mobile phones, each individual generates tens to hundreds of these digital breadcrumbs on a daily basis and this number is only increasing. Through specific agreements or through open-data challenges [17], location data on millions of users is readily available to researchers and has been used extensively to augment and sometimes replace traditional travel surveys. This data now forms the core of numerous new mobility studies and models some of which we describe below.

Fig. 2.1
figure 1

Mobile phones are increasingly being used to collect high-resolution mobility data. This figure from de Montjoye et al. [15] depicts (a) a sequence of calling events made by a user at different locations. (b) These events are localized to the area served by the closest mobile phone tower to the use and (c) can be aggregated into individual specific neighborhoods where a user is likely to be found at different times of the day or week

Though generally less common than CDRs, applications running on smartphones may access even more precise estimates of a user’s position. A variety of these sensors, from GPS to wifi, can pinpoint the location of a device to within just a few meters and can record data every few minutes [1]. Similarly, protocols such bluetooth and NFC allow devices to discover and connect to one another within a few meter radius, creating ad hoc sensor and social proximity networks [21]. Some of these applications and underlying social-networks explicitly add crucial context to mobility data. Foursquare invites users to “check-in” at specific places and establishments, Twitter will automatically geotag tweets with precise coordinates from where they were sent, and the Future Mobility survey app passively maintains an activity diary [14] requiring little input from users.

Infrastructure and public services have also become much smarter and now collect data on their usage to improve and help plan operations. Toll booths automatically count and track cars and this data has helped create accurate and real-time traffic estimates used by mapping and navigation services to provide better routing information. Subways, streetcars, and busses use electronic fare systems that record when millions of users enter and exit transportation systems to help better predict demand. In addition to smarter public infrastructure, the ecosystem created by digital devices has given birth to entirely new transportation services such as Hubway, the Boston bike rental service, that collects data on every bike ride and has even released some publiclyFootnote 6 or Uber, an on-demand car service, that uses historical usage data to balance the time a user has to wait for a car to arrive and the time drivers spend without clients. Finally, on-board devices and real-time data feeds from automatic vehicle location (AVL) systems power applications such as NextBus to track the location of thousands of busses and subways across the world to display and predict when the next bus will arrive. While smart infrastructure comes with its own privacy challenges [35],Footnote 7 vehicle and public transport data offer additional information to urban planners and mobility modelers to better understand these systems.

Finally, most practical mobility models need to properly account for geography such as mountains and rivers, transportation infrastructure such as bridges and highways, differences in density between urban and rural areas, and numerous other factors. Thankfully, the digitization of maps has led to an explosion of geographic data layers. Geographic information systems (GIS) have improved dramatically while falling data storage prices have made it possible for small and large cities to offer their public mapping data to citizens in an online, machine readable format. The U.S. Census Bureau’s TIGERline program, San Francisco’s OpenSF, and New York City’s PLUTO data warehouse are just a few sources that offer huge repositories of publicly accessible geographic data on everything from building footprints and the location of individual trees in a city. Open- and crowd-sourced initiatives like OpenStreetMap allow anyone in the world to contribute and download high-resolution digital maps of roads, buildings, subways, and more, even in developing areas that may not have institutional resources to create them. Private efforts such as Google Maps and MapBox offer high-resolution satellite imagery, route planning, or point of interest information through free or low cost APIs. Put together, these resources provide a digital map of the world that serves as a rich backdrop on which to study human mobility and the infrastructure built to facilitate it.

Put together, new sources from CDRs to public transport data, from mobile phone applications to AVLs generate a dataset with size and richness prohibitively expensive to match via traditional methods. Collected passively and without any effort from the user, this data is often more robust to manipulation by conscious or unconscious biases and provide a signal that is difficult to fake. While we are convinced of the potential of this data, it is always important to remember that it is not without pitfalls. It would be illusory to think that all of the old biases or hidden variables would simply disappear because the data is large. In some cases, data is only recorded when an individual interacts with a device which may bias when samples are taken [47]. Similarly it is important to keep in mind that even if it covers a significant fraction of the population this data might not be representative. Finally, these data generally come stripped of context. We do not know why an individual has chosen to move or what they will be doing there. For these reasons, sampling and robust statistical methods are still—maybe more than everFootnote 8—needed to use this data to augment our current understanding of human mobility while still providing robust conclusions. We now discuss a number of studies that aim to do just this.

3 Individual Mobility Models

Understanding mobility at an individual level entails collecting and analyzing sets of times, places, and semantic attributes about how and why users travel between them. For example, on a typical morning one may wake up at home, walk to a local coffee shop on the way to the bus that takes them to work. After work they may go to the grocery store or meet a friend for dinner before returning home only to repeat the process the next day. The goal modeling this mobility is to understand the underlying patterns of individuals using new high resolution data. While models have been used to plan infrastructure or public transport, they have also uncovered insights into the underlying nature of human behavior: we are slow to explore, relatively predictable, and mostly unique.

Early modeling work draws a great amount of inspiration from statistical physics, with numerous efforts making parallels with human mobility and random walk or diffusion processes. One of the used data from the crowdsourced “Where’s George” project. Named after George Washington, whose head appears on the $1 bill, the project stamped bills asking volunteers to enter the geographic location and serial number of the bills in order to build a travel history of various banknotes. As bills are primarily carried by people when traveling from store to store, a note’s movement serves as a proxy for human movement. Modeling the bills trajectories as continuous random walks, Brockmann et al. found that their movement appears to follow a Levy flight process [8]. This process is characterized by subsequent steps whose angular direction is uniformly distributed, but whose step-lengths follow a fat-tailed distribution. While small jumps are most probable, bills have a significant probability of making long jumps from time to time. These findings are aligned with observations that humans tend to make many short trips in a familiar area, but also take longer journey’s now and then.

In 2008, Gonzalez et al. [23] showed that the movement of these bills does not tell the whole story. Using a CDRs dataset of more than 100,000 users over a 6 month period in a European country (Fig. 2.2a), they showed that the step-length distribution for the entire population was better approximated by a truncated power-law \(P(\Delta r) = (\Delta r + \Delta r_0)^{-\beta }\exp (-\Delta r / \kappa )\) with exponent β = 1.79 and cutoff distances between 80 and 400 km. This suggests that Levy flights are only a good approximation of individual’s mobility for short distances.To understand the mechanism that gives rise to this distribution, the authors borrowed a quantity from polymer physics known as the “radius of gyration” r g:

$$\displaystyle \begin{aligned} r_g(t) = \sqrt{ \frac{1}{N(t)}\sum_{i=1}^{N(t)}(\mathbf{r} - \mathbf{r_{cm}})^2}, \end{aligned} $$
(2.1)

where N(t) are the number of observed locations and r cm is the mean location of the user during the observation period. In essence, the radius of gyration is a measurement of the characteristic distance an individual travels during an observation period t. The authors then showed that the distribution of r g in the population is itself well approximated by a truncated power-law with \(r^0_g= 5.8\) km, \(\beta _{r_g}=1.65\), and a cutoff of κ = 350 km (Fig. 2.2b). Simulations suggest that the step-length distribution of the entire population is produced by the convolution of heterogeneous Levy flight processes, each with a different characteristic jump size determined by an individual’s radius of gyration. Put differently, each person’s mobility can be approximated by a Levy flight process up to trips of some individual characteristic distance r g. After this distance, however, the probability of long trips drops far faster than would be expected from a traditional Levy flight.

Fig. 2.2
figure 2

(a) Individual mobility trajectories are passively collected from mobile devices [23]. (b) Measuring the distribution of radius of gyrations, r g within a population of 100,000 users in a European country reveals considerable heterogeneity in typical travel distance of individuals. Moreover, this distribution cannot be explained by modeling each individual’s movement as realizations of a single Levy flight process [23]. (c and d) Show the slower than linear growth in new locations visited over time S(t) and that the probability a location is visited next is inversely proportional to the frequency it has been visited in the past [54]. (e) This preferential return contributes to strikingly high predictability R(t) over time while (f) the number of unique locations visited in any given hour is highly periodic and corresponds to the sleep-wake cycles of individuals [55]

Further investigation by the authors revealed the source of this behavior: the idiosyncrasy of human movements. Unlike random processes, humans are creature of habits and tend to returns to previously visited locations such as home or work. The nature of these returns was also found to follow a very particular pattern. An individual returns to a previously visited location with a probability proportional to that location’s rank P(L) 1∕L amongst all the places he or she visits. These non-random, predictable return visits are unaccounted for in random walk and Levy flight models and have been shown to be at the heart of deviations of observed behavior from random processes. Additional studies [9] have found similar patterns in both other CDRs datasets and Foursquare or Twitter check-ins.

Subsequent work by Song et al. [54] further studied how individual-specific locations need to be taken into account in mobility models. Using a similar CDR dataset, the authors showed three important characteristics of human behavior. First, the number of unique locations visited by individuals S(t) scales sub-linearly with time S(t) t μ where μ = 0.6 (Fig. 2.2c). Second, the probability an individual returning to a previously visited locations scales with the inverse of the rank of that location P(L) L ζ where ζ = 1.2 (Fig. 2.2d), a phenomena labeled as ‘preferential return’. And third, the mean displacement (Δr) of an individual from a given starting point shows slower than logarithmic growth, demonstrating the extremely slow diffusion of humans in space. In essence, these finding pinpoint the dampening of explorative human movement overtime. Long jumps are observed so infrequently that they do not affect the average displacement of individuals. The authors then propose a new model of human mobility to capture these three characteristics. The model is as follows: starting at time t, an individual will make a trip at some future time Δt drawn from a fat-tailed probability distribution measured from CDRs. With probability ρS γ, the individual travels to a new, never-before visited location some distance Δr away, where Δr is drawn from the fat-tailed distribution characterized in the previous model. With probability 1 − ρS γ an individual returns to a previously visited location according to the inverse rank equation.

These early models do not attempt to recover periodic aspects of movement (e.g. daily commuting) or semantic meaning of visits (e.g. to visit a friend or go shopping), or attempt to do so. They do, however, emphasize important statistical and scaling properties of human mobility and often successfully reproduce them. Taken together, these models show how we slow we human are in our exploration, returning more often than not to known places and with less long steps than predicted by a power-law distribution.

Approaching the problem from the perspective of machine and statistical learning, another set of models has uncovered and explored another facet of human mobility: how predictable we are. In [55], Song et al. used information theory metrics on CDRs to show the theoretical upper-bound on predictability using three entropy measures the entropy S, the random entropy S rand, and the uncorrelated entropy S unc. They then use their empirical distributions to derived an upper bound on a user’s predictability (\(\prod ^{max}\), \(\prod ^{rand}\), and \(\prod ^{unc}\)). On average, the potential predictability of an individual’s movement is an astounding 93% and no user displayed a potential predictability of less than 80%. To further quantify predictability, the author introduced two new metrics. They defined regularity R(t) as the probability a user is found at their most visited location during a given hour t, along with the number of unique locations visited during a typical hour of the week N(t) (Fig. 2.2e and f). Both show strong periodicity and regularity. These quantities have since been measured in different data sets in different cities and countries and have been shown to be consistent among them [9].

While the previous study provided a theoretic upper bound on the predictability of an individual, a number of statistical learning techniques have been developed to make predictions of where an individual will be at a given time. Early work in the area, predating even analytic computations, used Markov models and information on underlying transportation networks to predict transitions between mobile phone towers within cities. These models have been used to improve quality of service of wireless networks through proper resource allocation [33, 36, 40, 58]. Later work incorporated various trajectory estimation and Kalman filtering algorithms to predict movements in small spaces such as college campuses [38, 43].

Temporal periodicity was used by Cho et al. [12] in their Periodic Mobility Model and social behavior incorporated in the Period Social and Mobility Model. At their core, these models are mixture models in two-dimensional space that learn the probability distribution of a user to be at any given location at a given time from previous location data. The latter also account for the location history of social contacts. The authors used these models to estimate that as much as 30% of our trips may be taken for social purposes. Multivariate nonlinear time series forecasting produced similar results [19, 51] predicting where an individual will be either in the next few hours or at a given time of a typical day. These models, however, are all focused on predicting the geographic position of individuals at different times and do not attempt to understand what individuals may be doing there or any other semantics of place.

Though acquiring semantic information about mobility is more difficult than simply measuring geographic coordinates, it provides a much richer abstraction to study behavior. In one of the first studies to mine the behavior of college students using mobile phones, Eagle and Pentland [20] gave a few hundred students smart phones that recorded not only locations, but asked users to label each place with its function such as home or work. Applying principal component analysis to these abstract movements from semantic place to semantic place (as opposed to geographic movements alone), the authors found that an individual’s behavior could be represented as a linear combination of just a few ‘eigenbehaviors’. These eigenbehaviors are temporal vectors whose components represent activities such as being at home or being at work. They can be used to predict future behaviors, perform long range forecasts of mobility, and label social interactions [21, 48]. The price paid for such detailed predictions, however, is the need for semantic information about locations. Geographic positions need to be tagged with attributes such as home or work in order for them to be grouped and compared across individuals.

Another approach to studying more abstract measurements of individual location information comes from recent work by Schneider et al. [52]. The authors introduced mobility motifs by examining abstract trip chains over the course of a day. A daily mobility motif is defined a set of locations and a particular order that a person visits them over the course of a day. More formally, these motifs constitute directed networks where nodes are locations and edges are trips from one location to another. For example, the motif of an individual whose only trips in a day are to and from work will consist of two nodes with a two directed edges (one in both directions). Counting motifs in mobility data from both CDRs and traditional travel surveys, they find on average individuals visit three different places in a given day. They then construct all possible daily motifs for a given number of locations n and compute the frequencies that those motifs appear in human mobility data. Shockingly, while there exist over 1 million ways for a user to travel between 6 or fewer locations, 90% of people use one of just 17 motifs and nearly a quarter follow the simple two location commute motif introduced earlier (Fig. 2.3a). The authors found similar results in travel survey data and introduced a simple Markov model for daily mobility patterns which reproduces empirical results.

Fig. 2.3
figure 3

(a) Removing geographic coordinates from locations and only focusing on a set of unique places and the directed travel between them, mobility motifs reveal that the daily routines of people are remarkably similar. Despite over 1 million unique ways to travel between 6 or fewer points, just 17 motifs are used by 90% of the population. Moreover, the frequency of their appearance in CDR data matches very closely with more traditional survey methods [52]. (b) Despite this similarity and predictability, our movement displays a high degree of unicity. Just four spatiotemporal points is enough to differentiate a user from 95% of all others individuals [15]

It is tempting to hypothesize that high theoretical and practical predictability results from high levels of similarity between individuals in a region. Perhaps the pace of life, pull of mono-centric downtowns, or the structure of transportation systems funnel users to the same places and route choices. de Montjoye et al. [15] explored this hypothesis and found that, while predictable, an individual’s movement patterns are also unique. The authors introduced unicity, \(\mathcal {E}_p\), as the fraction of traces uniquely defined by a random set of p spatiotemporal points where a trace T is a set of spatiotemporal points, each containing a location and a timestamp. A trace is said to be uniquely defined by a set of points I p if it is the only trace that matches I p in the entire dataset. Applying this measure to a CDR dataset on 1.5 million users, the authors found that just four spatiotemporal points is enough to uniquely identify 95% of all users (Fig. 2.3b). The authors further study unicity when the data is coarsened spatially or temporally. They found \(\mathcal {E}\) \(\thicksim (v*h)^\beta \) unicity decrease as a power function with the spatial (v) and temporal resolution of the data (h) and that \(\beta \thicksim -p /100\). Taken together, these equations show that unicity decreases slowly with the spatial and temporal resolution of the data and that this decrease is easily compensated by the number of points p. High uniqueness in human mobility traces exists across many spatiotemporal scales. These results raise many questions about the privacy of massive, passively collected metadata datasets, but also highlight an interesting nuance of human mobility: though individuals are predictable, they are also unique.

Merging concepts of predictability and unicity, work by Sun et al. [57] used temporal encounter networks to study repeated co-locations between passengers using data from bus passengers in Singapore. Temporal encounter networks were constructed by connecting individuals if they rode the same bus at the same time. An average individual encountered roughly 50 people per trip and these trips were highly periodic, occurring at intervals associated with working hours as well as daily and weekly trips. A pair of individuals who encountered each other tended to meet an average of 2.5 times over the course of a week. The distribution of time between encounters reveals strong periodicity, with passengers riding the same bus to work in the morning riding the same home, or riding the same bus at the same time each morning. This finding illustrates the idiosyncrasies of human mobility. We not only visit just a few places during the day, we do so at the same times and by the same routes. Though both of these results suggest that our unicity should be low, the previous work shows us that this is not the case.

In summary, new data sources have allowed researchers to show that, over weeks and months, human movement is characterized by slow exploration, preferential return to previous visited places, exploration of daily motifs, and predictable uniqueness. These regularities have been used to develop algorithms capable of predicting movement with high degrees of accuracy and have been shown to mediate other important processes such as social behavior and disease spread. Individual mobility patterns, however, are not the only level of granularity of interest to researchers, city planners, or epidemiologist. Aggregate movement can be either derived from individual level model or modeled as an emergent, personified phenomena. In the next section, we discuss works and models which aim at describing and modeling aggregate movement and flows of many individuals from place to place.

4 Aggregate Mobility

Aggregated mobility is used for planning urban spaces, optimizing transportation networks, studying the spread of ideas or disease, and much more. Perhaps the largest component in these models are origin-destination matrices that store the number of people traveling from any location to any other at different times or by different means. Like many complex systems, aggregate behavior is often more than the sum of individual parts and can be modeled separately. Additional layers of complexity are also needed to account for and sometimes explain individual choice of mode of transportation or route as described by the “four step model” [41, 45].

Like their individual-focused counterparts, many of these aggregate models are inspired by physical processes. Some of the earliest techniques for estimating origin-destination matrices are gravity models which have been used to model flows on multiple scales, from intra-city to international [27, 45]. Borrowed directly from Newton’s law of gravitation, the number of trips T ij taken from place i to place j is modeled as a function of the population of each place m i and m j and some function of the distance between them f(r ij). The intuition is that the population of a place, it’s mass, is responsible for generating and attracting trips and thus the total flux between the two places should be proportional to the product of the two masses while the distance between them mitigates the strength of this connection. In the fully parameterized version of this model, an exponent is applied to the population at the origin and destination \(T_{ij} = a\frac {m_i^{\alpha } m_j^{\beta }}{f(r_{ij})}\) to account for hidden variables that may be specific to local regions or populations. While the classical gravity model from physics is recovered by setting α = β = 1, and \(f(r_{ij}) = r_{ij}^2\), these parameters are generally calibrated for specific application using survey data.

Gravity models, however, are not without limitation. First, they rely on a large number of parameters to be estimated from sparse survey data which often leads to overfitting and, second, they fail to account for opportunities that exist between the two masses of people. The latter fault results in the same flow of people being estimated between two locations whether there is an entire city or an empty desert between them. Intuitively, one would expect that trips between places would be affected by the intervening opportunities to complete a journey. These shortcomings led Simini et al. to develop the radiation model [53]. Again borrowing from physics (this time radiation and absorption), they imagined individuals being emitted from a place at a rate proportional to its population and absorbed by other locations at a rate proportional to the population there. In this model, the probability that an emitted person arrives at any particular place is a function of their probability of not being absorbed before getting there. The model is as follows: \(T_{ij} = T_i \frac {m_i m_j}{(m_i + s_{ij})(m_i + m_j + s_{ij})}\), where T i is total number of trips originating from location i and s ij is the population within a disc centered on location i with a radius equal to the distance between i and j. The radiation model does not directly depend on the distance between the two places, taking instead into account the opportunities in-between them (Fig. 2.4a). Unlike the gravity model, the radiation model is parameterless and requires only data on populations to estimate flow. The authors showed that despite its lack of parameters, the radiation model provides better estimates of origin-destination flows than the gravity model for areas the size of counties or larger.

Fig. 2.4
figure 4

(a) The radiation model accounts for intervening opportunities, producing more accurate estimates of flows between two places than more traditional gravity models [53]. (b) Routing millions of trips measured from CDR data to real road networks makes it possible to measure the importance of a road based on how many different locations contribute traffic to it, K road. Understanding how transportation systems perform under different loads presents new opportunities to solve problems related to congestion and make infrastructure more efficient [63]

Yang et al. adapted Simini’s radiation model to correct for distortions caused at different scales [67]. They showed the original radiation model’s lower accuracy in urban environment is due to the relatively uniform density and small distances that characterize cities. In dense urban areas, distances are all relatively short and an individual may choose to visit a particular location due to hedonic attributes regardless of whether it is convenient to get to or not. Yang et al. subsequently introduced a scaling parameter α in the function describing the conditional probability an individual is absorbed at a location. This single parameter was enough to correct for these distortions and to provide a model that works on any length scale. Moreover, the authors suggested that for urban areas, the density of points of interest (POIs) such as restaurants and businesses is a better predictor of the absorption of a place than its population. Iqbal et al. [31] have demonstrated an improved way to extract valid, empirical OD matrices from call detail records (CDRs) data to validate the model.

Finally, activity-based models [5] model user intent more explicitly. They hypothesize that all trips are made to fulfill certain needs or desires of an individual. Travel and survey diaries are used to identify those needs for different segments of the population and how they are typically fulfilled. This knowledge can then be used by the model given the demographics of individuals and environmental factors. These models are closely related to agent-based models simulating the behavior of city residents and rely heavily on the idea of economic utility.

From a practical perspective, city planners need to know not only how many people will go from point A to point B at a certain time of the day but also the mode of transportation and route choice of these individuals. For example, we would like to predict which route they will take so that we can properly estimate the stress placed on transportation systems and potentially optimize performance. Models of route choice typically assume that individual rationally chose the path from A to B that minimize some cost function such as total travel time or distance. Paths can be computed on a road network using shortest path algorithms such as the traditional Dijkstra algorithm or A-Star, an extension that enjoy better performance thanks to heuristics. Other information such as speed limits can also be taken into account to estimate free flow travel times.

More advanced models are needed to account for the impact of congestion as drivers rarely encounter completely empty freeways. Incremental traffic assignment algorithms model congestion endogenously [56]. Trips are first split into increments containing only a fraction of total flow between two points. Trips in each increment are then routed along shortest paths independently of all other trips in that increment keeping counts of how many trips were assigned to each road. The travel times are then adjusted according to a volume delay function that accounts for the current congestion on a road where congestion is computed as the ratio between the volume of traffic assigned to the segment and the capacity of the road (referred to as volume-over-capacity). Trips in the next increment are then routed using updated costs until all flow has been accounted for. In this way, as roads become more congested and the travel time increases, drivers in later iterations are assigned to different, less congested routes. Values of total volume on each road, congestion, and travel times can then be validated against traffic counters, speed sensors, or data from vehicle fleets like taxis and busses but also smartphones such as in the Mobile Millennium project [28, 30, 32, 49].

Wang et al. [63] further explored the use of CDRs as input for these incremental algorithms to estimate traffic volume and congestion. After correcting for differences in market share and vehicle usage rates, they measure trips by counting consecutive phone calls of individuals as they move through the city to generate flow estimates that were then routed. Using this approach, Wang et al. show the distribution of traffic volume and congestion to be well approximated by an exponential mixture model. This model depends on the number of major and minor roadways in a cities network. Using the same approach, the authors describe the usage patterns of drivers by a bipartite usage graph connecting locations in the city to roads used by those travelers (Fig. 2.4b). Roads can be defined by the number of locations that contribute traffic them and places can be described by the roads used to visit. The “function” of a road can then be classified by comparing its topological to its behavioral importance. For example, a bridge may be topologically important because it is the only way to cross a river, but a main street may be behaviorally important because it attracts motorists from many different neighborhoods. Using these measures, researchers were able to devise congestion reduction strategies that target the 2% of neighborhoods where trip reduction will have the largest network wide effect. They found this smart reduction strategy is three to six times as effective as a random trip reduction strategy. Further work used this analysis to predict traffic jams [62, 64].

Private cars, however, are not the only mode of transportation studied. Using smartphones and AVL data, researchers have been mapping the routes followed by public transport and even privately owned mini-buses in the developing countries [11, 18, 50]. Similarly, data on air travel has been increasingly available to study aggregated mobilities between cities for applications in epidemiology (see below).

5 Human Behavior and Mobility

While of obvious interest to travelers, urban planners and transportation engineers, people’s movement strongly impacts other areas. Though by no means an exhaustive list, we highlight three areas here: social behavior, disease and information spread, and economic outcomes. Many of these dynamics are discussed in greater detail in further sections of this volume.

5.1 Mobility and Disease Spread

Human movement via cars, trains, or planes has always been a major vector in the propagation of diseases. Consequently, the human mobility data and models discussed so far have increasingly been used to study the propagation of diseases. For example, CDR data has been used to map mobility patterns in Kenya helping researchers in their fight against Malaria [65, 66]. More recently, CDR and other data from West-Africa has been used to model regional transportation patterns to help control the spread of Ebola.Footnote 9 Finally, air travel data has become central to the study of global epidemics when planes allow an individual to travel between nearly any two points on the globe in a matter of hours. The global airline network therefore often determines how potent an epidemic could be and its likely path across the globe [3, 4, 13, 42, 44] (Fig. 2.5a).

Fig. 2.5
figure 5

(a) Global air travel has dramatically increased the speed at which diseases can spread from city to city and continent to continent [44]. (b) Mobility also adds context to social networks. When two individuals visit the same locations can suggest the nature of a social relationship [60]. (c) Mobility and the access it provides has strong correlations with economic outcomes. Children have dramatically different chances at upward economic mobility in certain places of the United States than others [10]

5.2 Mobility and Social Behavior

Intent is a crucial element of human mobility and movement is often a means to a social end. Despite new communications technologies making it easier than ever to connect across vast distances, face to face interactions still play an important role in social behavior whether it is the employees of a company commuting to a central workplaces or friends meeting at a restaurant on a weekend. The link between social contacts and mobility has becoming increasingly prominent in research as mobility data is often collected through mobile phones or location-based social networks.

Using data from an online social-network, Liben-Nowell showed the probability of being friends with another individual to decrease at a rate inversely proportional to the distance between them suggesting a gravity model of the form discussed above [39]. Subsequent work verified Liben-Nowell findings in other social networks [2, 24] while Toole et al. [59] showed the importance of taking into account geography when studying social-networks and how information spreads through them. Moreover, geographic characteristics can be used to predict the social fluxes between places [29]. Conversely, social contacts are very useful in predicting where an individual would travel next [12, 19, 61] and Cho et al. find that while 50–70% of mobility can be explained as periodic behavior, another 10–30% are related to social interactions.

Models such as the one proposed by Grabowicz et al. [24] or Toole et al. [60] have subsequently been developed to incorporate this dynamic and evolve both social networks and mobility simultaneously. For example, Grabowicz et al. incorporate social interactions by having individuals travel in a continuous 2D space where an individual travel’s is determined by the location of their contacts and use location as a determinant of new social tie creation. The model is as follows: with probability p v, an individual moves to the location of a friend, and, with probability 1 − p v, they choose a random point to visited some distance Δr away. But, while social ties impact mobility, mobility can also impact social ties. Upon arriving at a new location, the individual can thus choose to form social ties with other individuals within a radius with probability p or random individuals anywhere in the space with probability p c, a free parameter. A simple model is here also able to reproduce many empirical relationships found in social and mobility data.

5.3 Mobility and Economic Outcomes

Mobility not only provides people with social opportunities, it also provides economic ones. Economists and other social scientists have developed numerous theories on the role of face to face interactions in socio-economic outcomes and economic growth. In-person meetings are thought to unlock human capital, making us productive [22, 34]. For example, jobs in dense cities tend to pay higher wages than the same jobs in more rural areas even after controlling for factors such as age and education [68] in part due to productivity and creativity gains made possible by the rich face to face interactions that close spatial proximity facilitates. Universal urban scaling laws have been repeatedly found showing that societal attributes from the number of patents to average walking speed scales with population and theoretic models have been proposed that suggest density is at the heart of these relationships [6, 7, 46]. While density is one way to propagate these benefits, increased mobility is another. Poorer residents of cities have for example been shown to have better job prospects and higher chances of retaining jobs when given a personal car instead of being constrained by public transit [25]. Finally, Chetty et al. [10] found strong correlations between intergenerational economic mobility and variables related to the commuting times and spatial segregation of people (Fig. 2.5c). While we are only beginning to explore these relationships, early returns suggest that mobility is a critical component of many economic systems.

6 Conclusion

In this chapter, we reviewed a number of ways new data sources are expanding our understanding of human mobility. Applying methods from statistical physics, machine learning, and traditional transportation modeling, reproducible characteristics of human movement become visible. We explore slowly [23, 54], we are highly predictability [19, 55], and we are mostly unique [15]. Models of aggregate flows of people from place to place have also found success with analogies to statistical physics validated by new data sources [53]. More accurate measurements of city-wide traffic has made it easier than ever to assess the performance of transportation systems and devise strategies to improve them [63]. Valuable in their own rights, these insights have informed our understanding of other social phenomena as well, leading to more accurate models of disease spread, social interactions, and economic outcomes. As cities become home to millions for people each year, the insights gained from these new data are critical for making them more sustainable, safer, and better places to live.