Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Scope of Workshop and the Book

The chapters in this book were first presented in a 2-day workshop on Big Data and Urban Informatics held at the University of Illinois at Chicago in 2014. The workshop, sponsored by the National Science Foundation, brought together approximately 150 educators, practitioners and students from 91 different institutions in 11 countries. Participants represented a variety of academic disciplines including Urban Planning, Computer Science, Civil Engineering, Economics, Statistics, and Geography and provided a unique opportunity for discussions by urban social scientists and data scientists interested in the use of Big Data to address urban challenges. The papers in this volume are a selected subset of those presented at the workshop and have gone through a peer-review process.

Our main motivation for the workshop was to convene researchers and professionals working on the emerging interdisciplinary research area around urban Big Data. We sought to organize a community with interests in theoretical developments and applications demonstrating the use of urban Big Data, and the next-generation of Big Data services, tools and technologies for Urban Informatics. We were interested in research results as well as idea pieces and works in progress that highlighted research needs and data limitations. We sought papers that clearly create or use novel, emerging sources of Big Data for urban and regional analysis in transportation, environment, public health, land-use, housing, economic development, labor markets, criminal justice, population demographics, urban ecology, energy, community development, and public participation. A background paper titled Big Data and Urban Informatics: Innovations and Challenges to Urban Planning and Knowledge Discovery (Thakuriah et al. 2016b) documenting the major motivations for the workshop is a chapter in this book.

2 Topics on Big Data and Urban Informatics

The chapters in this book are organized around eight broad categories: (1) Analytics of user-generated content; (2) Challenges and opportunities of urban Big Data; (3) Changing organizational and educational perspectives with urban Big Data; (4) Urban data management; (5) Urban knowledge discovery applied to a variety of urban contexts; (6) Emergencies and Crisis; (7) Health and well-being; and (8) Social equity and data democracy.

2.1 Analytics of User-Generated Content

The first set focuses on how to analyze user-generated content. Understanding urban dynamics or urban environmental problems is challenged by the paucity of public data. The ability to collect and analyze geo-tagged social media is emerging as a way to address this shortage or to supplement existing data, for use by planners, businesses and citizens. New platforms to integrate these forms of data are proposed (Tasse and Hong 2016) but are not without their limitations. In particular, GIS platforms have been evaluated (Tang et al. 2016) that hint at the critical role of committed users in ensuring the successful and reliable use of these tools, and the consequent need for integration of online and off-line activities and for the effective transfer of information to individuals’ mobile devices.

Other GIS-enabled frameworks are proposed (Yin et al. 2016a) to support citizen sensing of urban environmental pollution like noise. Such participatory computing architecture supports scalable user participation and data-intensive processing, analysis and visualization.

2.2 Challenges and Opportunities of Urban Big Data

The second set of papers considers the challenges and opportunities of urban Big Data, particularly as an auxiliary data source that can be combined with more traditional survey data, or even as a substitute for large survey-based public datasets. Big Data exists within a broader data economy that has changed in recent years (e.g., the American Community Survey (ACS) data quality). Spielman (2016) argues that carefully considered Big Data sources hold potential to increase confidence in the estimates provided by data sources such as the ACS. Recognizing that Big Data appears as an attractive alternative to design-based survey data, Johnson and Smith (2016) caution the potential of serious methodological costs and call on efforts to find ways of integrating these data sources, which have different qualities that make them valuable to understand cities (Johnson and Smith 2016).

In addition to the cost savings, the potential for data fusion strategies lies in the integration of a rich diversity of data sources shedding light on complex urban phenomena from different angles, and covering different gaps. There are, however, major barriers to doing so, stemming from the difficulty in controlling the quality and quantity of the data, and privacy issues (Spielman 2016). The proliferation of Big Data sources also demand new approaches to computation and analysis. Gunturi and Shekhar (2016), explore the computational challenges posed by spatio-temporal Big Data generated from location-aware sensors and how these may be addressed by use of scalable analytics. In another application, Antunes et al. (2016) discuss how explicitly addressing heteroscedasticity greatly improves the quality of model predictions and the confidence associated with those predictions in regression analysis using Big Data.

2.3 Changing Organizational and Educational Perspectives with Urban Big Data

The third set of papers focus on the organizational and educational perspectives that change with Big and Open Urban Data. Cities are investing on technologies to enhance human and automated decision-making. For smarter cities, however, urban systems and subsystems require connectivity through data and information management. Conceptualizing cities as platforms, Krishnamurthy et al. (2016) discuss the importance of how data and technology management are critical for cities to become agile, adaptable and scalable while also raising critical considerations to ensure such goals are achieved. Thakuriah et al. (2016a) review organizations in the urban data sector with the aim of understanding their role in the production of data and service delivery using data. They identify nine organizational types in this dynamic and rapidly evolving sector, which they align along five dimensions to account for their mission, major interest, products and activities: techno-managerial, scientific, business and commercial, urban engagement, and openness and transparency.

Despite the rapid emergence of this data rich world, French et al. (2016) ask if the urban planners of tomorrow are being trained to leverage these emerging resources for creating better urban spaces. They argue that urban planners are still being educated to work in a data poor environment, taking courses in statistics, survey research and projection and estimation that are designed to fill in the gaps in this environment. With the advent of Big Data, visualization, simulation, data mining and machine learning may become the appropriate tools planners can use, and planning education and practice need to reflect this new reality (French et al. 2016). In line with this argument, Estiri (2016) proposes new planning frameworks for planning for urban energy demand, based on improvements that non-linear modeling approaches provide over mainstream traditional linear modeling.

2.4 Urban Data Management

The book also includes examples of online platforms and software tools that allow for urban data management and applications that use such urban data for measurement of urban indicators. The AURIN (Australian Urban Research Infrastructure Network) workbench (Pettit et al. 2016), for example, provides a machine-to-machine online access to large scale distributed and heterogeneous data resources from across Australia, which can be used to understand, among other things, housing affordability in Australia. AURIN allows users to systematically access existing data and run spatial-statistical analysis, but a number of additional software tools are required to undertake data extraction and manipulation. In another application to measure the performance of transit systems in San Francisco, researchers have developed software tools to support the fusion and analysis of large, passively collected data sources like automated vehicle location (AVL) and automated passenger counts (APC) (Erhardt et al. 2016). The tools include methods to expand the data from a sample of buses, and is able to report and track performance in several key metrics and over several years. Queries and comparisons support the analysis of change over time.

Owen and Levinson (2016) also showcase a national public transit job accessibility evaluation at the Census block level. This involved assembling and processing a comprehensive national database of public transit network topology and travel times, allowing users to calculate accessibility continuously for every minute within a departure time window of interest. The increased computational complexity is offset by the robust representation of the interaction between transit service frequency and accessibility at multiple departure times.

Yet, the data infrastructure needed to support Urban Informatics does not materialize overnight. Wu and Zhang (2016) demonstrate how resources at the scale of an entire country is needed to establish basic processes required to develop comprehensive citizen-oriented services. By focusing on China’s emerging smart cities program, they demonstrate the need for a proactive data-driven approach to meet challenges posed by China’s urbanization. The approach needs not only a number of technological and data-oriented solutions, but also a change in culture towards statistical thinking, quality management, and data integration. New investments in smart cities have the potential to design systems such that the data can lead to much-needed governmental innovations towards impact.

2.5 Urban Knowledge Discovery

Big Data is playing a major role in urban knowledge discovery and planning support. For example, a high-resolution digital surface model (DSM) from Light Detection and Ranging (LiDAR) have supported the dynamic simulation of flooding due to sea level rise in California (Ju et al. 2016). This study provides more detailed information than static mapping, and serves as a fine database for better planning, management, and governance to understand future scenarios. In another example, Khan and Machemehl (2016) study how land use and different social and policy variables affect free-floating carsharing vehicle choice and parking duration, for which there is very little empirical data. The authors use two approaches; logistic regression and a duration model and find that land-use level socio-demographic attributes are important factors in explaining usage patterns of carsharing services. This has implications for carsharing parking policy and the availability of transit around intermodal transportation. Another example by Grinberger et al. (2016) shows that synthetic big data can also be generated from standard administrative small data for applications in urban disaster scenarios. The data decomposition process involves moving from a database describing only hundreds or thousands of spatial units to one containing records of millions of buildings and individuals (agents) over time, that then populate an agent-based simulation of responses to a hypothetical earthquake in downtown Jerusalem. Simulations show that temporary shocks to movement and traffic patterns can generate longer term lock-in effects, which reduce commercial activity. The issue arising here is the ability to identify when this fossilization takes place and when a temporary shock has passed the point of no return. A large level of household turnover and ‘churning’ through the built fabric of the city in the aftermath of an earthquake was also observed, which points to a waste of resources, material, human and emotional. Less vulnerable socio-economic groups ‘weather the storm’ by dispersing and then re-clustering over time.

A suite of studies focuses on new methods to apply Big Data to transportation planning and management, particularly with the help of GIS tools. Benenson et al. (2016) use big urban GIS data that is already available to measure accessibility from the viewpoint of an individual traveler going door-to-door. In their work, a computational application that is based on the intensive querying of relational database management systems was developed to construct high-resolution accessibility maps for an entire metropolitan area, to evaluate new infrastructure projects. High-resolution representations of trips enabled unbiased accessibility estimates, providing more realistic assessments of such infrastructure investments, and a platform for transportation planning. Similarly, Yang and Gonzales (2016) show that Big Data derived from taxicabs’ Global Positioning Systems (GPS) can be used to refine travel demand and supply models and street network assessments, by processing and integrating with GIS. Such evaluations can help identify service mismatch, and support fleet regulation and management. Hwang et al. (2016) demonstrate a case where GPS trajectory data is used to study travel behavior and to estimate carbon emission from vehicles. They propose a reliable method for partitioning GPS trajectories into meaningful elements for detecting a stay point (where an individual stays for a while) using a density-based spatial clustering algorithm.

2.6 Emergencies and Crisis

Big Data has particular potential in helping to deal with emergencies and urban crises in real time. Cervone et al. (2016) propose a new method to use real-time social media data (e.g., Twitter, photos) to augment remote sensing observations of transportation infrastructure conditions in response to emergencies. Challenges remain, however, associated with producer anonymity and geolocation accuracy, as well as differing levels in data confidence.

2.7 Health and Well-Being

Health and well-being is another major area where Big Data is making significant contributions. Data on pedestrian movement has however proven difficult and costly to collect and analyze. Yin et al. (2016b) propose and test a new image-based machine learning method which processes panoramic street images from Google Street View to detect pedestrians. Initial results with this method resemble the pedestrian field counts, and thus can be used for planning and design. Another paper, by Hipp et al. (2016) using the Archive of Many Outdoor Scenes (AMOS) project aims to geolocate, annotate, archive, and visualize outdoor cameras and images to serve as a resource for a wide variety of scientific applications. The AMOS image dataset, crowdsourcing, and eventually machine learning can be used to develop reliable, real-time, non-labor intensive and valid tools to improve physical activity assessment via online, outdoor webcam capture of global physical activity patterns and urban built environment characteristics.

A third paper (Park 2016) describes research conducted under the Citygram project umbrella and illustrates how a cost-effective prototype sensor network, remote sensing hardware and software, database interaction APIs, soundscape analysis software, and visualization formats can help characterize and address urban noise pollution in New York City. This work embraces the idea of time-variant, poly-sensory cartography, and reports on how scalable infrastructural technologies can capture urban soundscapes to create dynamic soundmaps.

2.8 Social Equity and Data Democracy

Last, but not least, Nguyen and Boundy (2016) discuss issues surrounding Big Data and social equity by focusing on three dimensions: data democratization, digital access and literacy, and promoting equitable outcomes. The authors examine how Big Data has changed local government decision-making, and how Big Data is being used to address social equity in New York, Chicago, Boston, Philadelphia, and Louisville. Big Data is changing decision-making by supplying more data sources, integrating cross agency data, and using predictive rather than reactive analytics. Still, no study has examined the cost-effectiveness of these programs to determine the return on investment. Moreover, local governments have largely focused on tame problems and gains in efficiency. Technologies remain largely accessible to groups that are already advantaged, and may exacerbate social inequalities and inhibit democratic processes. Carr and Lassiter (2016) question the effectiveness of civic apps as an interface between urban data and urban residents, and ask who is represented by and who participates in the solutions offered by apps. They determine that the transparency, collaboration and innovation that hackathons aim to achieve are not yet fully realized, and suggest that a first step to improving the outcomes of civic hackathons is to subject these processes to the same types of scrutiny as any other urban practice.

3 Conclusions

The urban data landscape is changing rapidly. There has been a tremendous amount of interest in the use of emerging forms of data to address complex urban problems. It is therefore an opportune time for an interdisciplinary research community to have a discussion on the range of issues relating to the objectives of Urban Informatics, the research approaches used, the research applications that are emerging, and finally, the many challenges involved in using Big Data for Urban Informatics.

We hope this volume familiarizes the reader to both the potential and the technological and methodological challenges of Big Data, the complexities and institutional factors involved, as well as the educational needs for adopting these emerging data sources into practice, and for adapting to the new world of urban Big Data. We have also sought to incorporate papers that highlight the challenges that need to be addressed so the promise of Big Data is fulfilled. The challenges of representativeness and of equity in the production of such data and in applications that use Big Data are also areas needing continued attention. We have aimed for making the volume comprehensive but we also recognize that a single volume cannot completely cover the broad range of applications using Big Data in urban contexts. We hope this collection proves an important starting point.