Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Introduction

The past 5 years have witnessed a profound transformation of how geographic data, information, and, more broadly, knowledge have been produced and disseminated due to the phenomenal growth of a plethora of related technologies loosely known as Web 2.0, cloud computing, and cyberinfrastructure. Although different lexicons have surfaced to describe this new trend by different communities, ranging from crowdsourcing to user-generated content, from Geoweb to the semantic Web, from volunteered geographic information to neogeography, PostGIS, citizen science, and eScience, the general idea coalesces around the use of the Internet to create, share, and analyze geographic information via multiple computing devices/platforms (traditional desktops, iPads, or smart phones).

Ever since the term volunteered geographic information (VGI) officially appeared in the literature (Goodchild 2007), there have been meetings and workshops devoted to the topic, including (to our knowledge) the 2007 NCGIA VGI workshop,Footnote 1 the AutoCarto 2008 workshop,Footnote 2 the USGS 2010 VGI workshop,Footnote 3 the GIScience 2010 VGI workshop,Footnote 4 and the 2011 VGI Pre-Conference at AAG.Footnote 5 Scholarly literature has also grown significantly, as evidenced by several special issues devoted exclusively to the theme of VGI in GeoJournal (Elwood 2008a, b), Journal of Location-Based Services special VGI issue (Rana and Joliveau 2009), and Geomatica (Feick and Roche 2010). In addition to these special issues devoted to VGI, research related to VGI has also been reported by an interdisciplinary group of researchers (Bennett 2010; Hall et al. 2010; Newman et al. 2010; Newsam 2010; Ramm and Topf 2010; Warf and Sui 2010; Kessler 2011; Obe and Hsu 2011; Roche et al. 2011).

The goal of this edited volume is to take stock of recent advances in VGI research, with particular emphasis on the role of VGI as crowdsourced data for geographic knowledge production. By doing so, we plan not only to present a state-of-the-art view of VGI as a research area but also to discuss the prospects and directions of VGI research in the near future. More than half of the chapters in this volume were based upon papers originally presented during the pre-AAG conference we organized on “Volunteered Geographic Information (VGI): Research progress and new developments”5 on April 11, 2011, in Seattle, Washington. We also solicited additional contributions to cover topics not adequately addressed at the Seattle conference but crucial for future VGI research.

The rest of this introductory chapter is organized as follows. We first situate the phenomenon of VGI in the broader context of the big-data wave, also known as the exaflood. We then discuss the increasing digital divide and uneven practices of VGI across the world, followed by a synoptic overview of other chapters in this book. We end this chapter by discussing the role of crowdsourcing in geographic knowledge production and the evolving role of GIScience and geography in the era of big data in achieving a better understanding of the world.

1.2 VGI and the Exaflood of Big Data

Until recently, the geospatial community has had a rather narrow definition of what is considered geographic data or information, often heavily influenced by the legacy of traditional cartography. But rapid advances in a plethora of technologies – GPS, smart phones, sensor networks, cloud computing, etc., especially all of the technologies loosely called Web 2.0 – have radically transformed how geographic data are collected, stored, disseminated, analyzed, visualized, and used. This trend is best reflected in Google’s mantra that “Google Maps  =  Google in Maps” (Ron 2008). The insertion of an “in” between Google and Maps perhaps signifies one of the most fundamental changes in the history of human mapping efforts. Nowadays, users can search though Google Maps not only for traditional spatial/map information but also for almost any kind of digital information (such as Wikipedia entries, Flickr photos, YouTube videos, and Facebook/Twitter postings) as long as it is geotagged. Furthermore, in contrast to the traditional top-down authoritative process of geographic data production by government agencies, citizens have played an increasingly important role in producing geographic data of all kinds through a bottom-up crowdsourcing process. As a result, we now have massive amounts of geocoded data growing on a daily basis from genetic to global levels covering almost everything we can think of on or near the Earth’s surface, on the average of 1 exabyte per day (Swanson 2007). For the first time in human history, we now have the capability to keep track of where everything is in real time.

Due to the ubiquity of information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, RFID (radio-frequency identification) readers, wireless sensor networks, and other types of data-gathering devices, 1–5 exabytes (1 exabyte  =  1018 bytes) of data are created daily and 90% of the data in the world today were created within the past 2 years (MacIve 2010). The amount of data humanity creates is doubling every 2 years; 2010 is the first year that we reached 1 zettabyte (1021 bytes).Footnote 6 2011 alone generated approximately 1.8 zettabytes of data. The explosive growth of big data is rapidly transforming all aspects of governments, businesses, education, and science. By 2020, the volume of the world’s data will increase by 50 times from today’s volume (Gantz and Reinsel 2011). We will need 75 times more IT-related infrastructure in general and ten times more servers to handle the new data. Metaphors of data storage have evolved from bank to warehouse, to portal, and now to cloud. Data storage cost has dropped dramatically during the past two decades. Between 2005 and 2011 alone, costs of storage dropped by 5/6. Not surprisingly, how to deal with the new reality of big data is on the top of the agenda of government, industry, and multiple disciplines in the academy (IWGDD 2009; CORDIS 2010; Manyika et al. 2011).

Although it is a challenging task to estimate the precise volume of geospatial data out there, we can safely say geospatial data is becoming an important part of the big-data torrent. Geospatial information in general and VGI in particular should be understood in the context of big data, and indeed, crowdsourcing, the Internet of things, and big data are rapidly converging in the domain of geospatial technologies (Ball 2011). Of course, due to rapid technological advances, what is considered as big vs. small is a moving target. In the McKinsey report (Manyika et al. 2011), “personal location data” has been singled out as one of the five primary big-data streams. With approximately 600 billion transactions per day, various mobile devices are creating approximately one petabyte (1015 bytes) of data per year globally. Personal location data alone is a $100-billion business for service providers and $700 billion to end users (Manyika et al. 2011). The other four streams of big data identified by the McKinsey Institute – health care, public-sector administration, retail, and manufacturing – also have a significant amount of data either geocoded or geotagged. So geospatial data are not only an important component of big data but are actually, to a large extent, big data themselves. For the geospatial community, big data presents not only bigger opportunities for the business community (Francica 2011; Killpack 2011) but also new challenges for the scientific and scholarly communities to conduct ground-breaking studies related to people (at both individual and collective levels) and environment (from local to global scale) (Elkus 2011; Meek 2011; Hayes 2012).

In fact, the geospatial community was tackling big-data issues even before “big data” became trend (Miller 2010). From very early on, geospatial technologies were at the forefront of big-data challenges, primarily due to the large volumes of raster (remote-sensing imagery) and vector (detailed property surveys) data that need to be stored and managed. Back in 1997 when Microsoft Research initiated a pilot project to demonstrate database scalability, they used aerial imagery as the primary data (Ball 2011). The TerraServer Microsoft developed then is still in use and functional today and set the standard and protocol for today’s other remote-sensing image serving sites such as OpenTopography.org (LiDAR data). Furthermore, to implement Al Gore’s (1999) vision of a “digital earth” requires big data. Although the concept of digital earth did not evolve quite as Gore envisioned during the past decade, the growing popularity of Google Earth, Microsoft’s Virtual Earth (now Bing Maps), and NASA’s World Wind is an indication that geospatial and mapping tools are crucial for users to navigate through the big-data torrent.

1.3 VGI in Shrinking and Divided World

As a concomitant growth of this ever-expanding digital universe filled with big data, the world (people, made objects and things, and environment) is increasingly being recorded, referenced, and connected by vast digital networks. Geographers, along with scholars in multiple other disciplines, have noted the acceleration of our temporal experience and the reduction of the role of distance for quite some time, as evidenced by the scholarly literature on time-space compression, time-space distanciation, or space-time convergence (Warf 2008). The growing popularity of social media on the global scene has pushed time-space compression to new levels.

Using a more popular term, the world is rapidly becoming smaller as a result of space-time convergence. When social psychologist Stanley Milgram (1967) did his experiment back in the 1960s on how many meaningful steps are needed to connect two strangers on the surface of the Earth, Milgram and his team (Travers and Milgram 1969) concluded then that on average it takes six steps to make a meaningful connection for two randomly selected individuals, later popularized as six degrees of separation by American playwright Paul Guare (in his 1990 play “Six Degrees of Separation”) and the game Six Degrees of Kevin Bacon (to connect people with Hollywood stars). In December 2011, Facebook and Yahoo! conducted a new analysis using the massive amount of data harvested from social media, and they concluded that six degrees of separation had been reduced to 4.7 by the end of 2011, largely as a result of people being increasingly connected on-line.Footnote 7

Almost paradoxically, as some parts of the world are flooded by big data and people are increasingly connected in a shrinking world, we must also be keenly aware that this world remains a deeply divided one – both physically and digitally (Fig. 1.1). While a large majority of people in North America and Europe have access to the Internet (with Internet penetration rates at 78.3% and 58.3%, respectively, by the end of 2011), two-thirds of humanity do not have access to the rapidly expanding digital world; the world average Internet penetration rate is 30.2% with Asia (23.8%) and Africa (11.4%) trailing at the bottom.Footnote 8 The geographical distribution of new digital data stored in 2010 reflects both the digital divide and uneven development levels across the globe, with the developed world or global north (North America and Europe) having 10–70 times more data than the developing world or global south (Africa, Latin America, and Asia) (Manyika et al. 2011). A third of humanity (about two billion people) still lives on under $2 a day.Footnote 9 We should also be mindful that sometimes simply having access to gadgets themselves is not enough. Many iPhone users in the developed world have enjoyed using one of multiple versions of restroom locators (e.g., have2p), but for a country like India, where there are more cell phones than toilets, simply having have2p installed on one’s iPhone would not help much in rural areas due to the severe lack of sanitary infrastructure.Footnote 10

Fig. 1.1
figure 1

World internet penetration rate by geographic regions – 2011 (http://www.internetworldstats.com/stats.htm)

In the context of geographic information (and to some extent other types of data as well), the biggest irony remains that Murphy’s law is still at work – information is usually the least available where it is most needed. We have witnessed this paradox unfolding painfully in front of our eyes in the Darfur crisis in northern Sudan (2006), the aftermath of the Haiti earthquake (2010), and the BP explosion in the Gulf of Mexico (2011). Undoubtedly, how to deal with big data in a shrinking and divided world will be a major challenge for GIS and geography in the years ahead. The strengths, weaknesses, opportunities, and threats (SWOT) of VGI for improving the spatial data infrastructure (SDI) are quite different in the two global contexts of north and south (Genovese and Roche 2010). Furthermore, as Gilbert and Masucci (2011) show so clearly in their recent work on uneven information and communication geographies, we must move away from the traditional, linear conceptualization of a digital divide, concerned primarily with physical access to computers and the Internet. Instead, we must consider the multiple divides within cyberspace (or digital apartheid) by taking into account the hybrid, scattered, ordered, and individualized nature of cyberspaces (Graham 2011). Indeed, multiple hidden social and political factors are at play for determining what is or is not available on-line (Engler and Hall 2007). Internet censorship (Warf 2011; MacKinnon 2012), power laws (or the so-called 80/20 rule) (Shirkey 2006), homophile tendencies in human interactions (de Laat 2010; Merrifield 2011), and fears of colonial and imperial dominance (Bryan 2010) are also important factors to consider for the complex patterns of digital divide and uneven practices of VGI at multiple scales on the global scene.

1.4 Overview of Chapters in This Book

This book is organized into 20 chapters. Chapter 1 by the editors situates VGI in the broader context of big data and the growing global digital divide. The substantive chapters (2–19) are grouped in three parts.

The six chapters in Part I focus on VGI, public participation, and citizen science. In Chap. 2, Rob Feick and Stéphane Roche extend our conceptualization of the value of VGI and geographic information (GI) more generally. The authors recognize that the proliferation of VGI has complicated our assessments of GI’s value. They review these complications in their specificity and propose several new metaphors, such as unexpected discoveries, Debord’s “dérives,” or Lego blocks, which might be used as a guide for future valuation of VGI. Chapter 3 by Francis Harvey questions whether “volunteered” completely captures the character of crowdsourced data. The author suggests that crowdsourced data can be disaggregated into categories of “volunteered” and “contributed.” The distinction between CGI and VGI is argued to be important for assessing particular crowdsourced data’s fitness for use and for identifying biases or inaccuracies. In Chap. 4, Barbara Poore and Eric Wolf track the changing discourse on geospatial metadata and – through two case studies – point to ongoing transformations in popular and academic engagement with metadata in the Geoweb. The authors suggest that we are in the midst of a shift and indeed promote a shift, from traditional unidirectional construction of metadata to more interactive user-friendly production of metadata. Chapter 5 by Peter Johnson and Renee Sieber contextualizes VGI somewhat differently, focusing on the vicissitudes of its adoption by government for interaction with citizens. Through reflection on their work with government agencies in Québec, the authors identify the different ways that VGI has been incorporated and discuss key obstacles to and constraints on further incorporation. In Chap. 6, Wen Lin examines the politics of citizen participation and processes of subjectification that are now emerging from the encounter of Web 2.0 and public participation GIS (PPGIS). The author works with reference to a case study in China, with three specific examples of VGI mapping drawn from ethnographic fieldwork. Out of a meeting of Web 2.0 and PPGIS have emerged transformations in Chinese citizenship and new spaces of citizen participation. Finally, as a contribution to the disaggregation of the umbrella term “VGI,” and also as a challenge to the exclusivity of a professionalized science that would marginalize voluntary (nonprofessional) practitioners, Chap. 7 by Muki Haklay discusses the specificity, historical trajectory, social context, power relations within, and promise of citizen science.

The six chapters in Part II concentrate on geographic knowledge production and place inference. With an interest in the opportunities generated by the flood of publicly available VGI, Chap. 8 by Bin Jiang contextualizes computational geography and reviews recent work in the field to demonstrate the promise of research that engages with big data. Jiang’s discussion of topological thinking also clarifies the challenge that contemporary computational geography presents to conventional views of space. In Chap. 9, Marcus Goetz and Alexander Zipf attend to the transformation of VGI from its early basis in two-dimensional geographical information to its contemporary inclusion of three-dimensional (3D) data. Through specific attention to OpenStreetMap, Goetz and Zipf emphasize emerging applications of 3D data for city modeling and building modeling. Chapter 10 by Jim Thatcher engages with “volunteered geographic services” (VGS), the term he uses to describe discrete actions made possible through spatially aware mobile devices like smart phones. Thatcher suggests that VGS pushes beyond the limits of VGI by linking users through time and space and facilitating coordination of actions on the ground. Thatcher reviews possible uses of VGS in crisis response through the example of the PSUMobile.org. Chapter 11 by Darren Hardy examines the geography of VGI authorship, focusing specifically on the case of Wikipedia and its geographic articles. Exemplifying the analysis of big data, the author describes a study of 32 million contributions to those articles over 7 years. Contradicting assertions of the Internet’s placelessness, Hardy finds that authorship of Wikipedia articles demonstrates distance decay. In Chap. 12, Benjamin Adams and Grant McKenzie draw together geographical insights on sense of place and techniques of computational representation, specifically latent Dirichlet allocation (LDA). The authors discuss topic modeling with VGI on travel blogs with an eye to identifying places with topics, calculating similarity between places, and evaluating changes in sense of place using computational methods. Chapter 13 is also concerned with the social relations of scientific practice: Jon Corbett writes of VGI in the context of a collaborative mapping project. Discussing his work with an aboriginal community, Corbett indicates that participatory mapping can cultivate a sense of place but that such collaborative projects demand reflexivity on the part of researchers.

The six chapters in Part III cover emerging applications and new challenges. In Chap. 14, David J. Coleman engages with underlying assumptions about VGI through consideration of conventional digital topographic mapping programs. It is argued that the updating and maintenance of maps cannot rely on VGI alone, even if VGI does indeed represent an important alternative and complementary source for data, which must be given further attention. Chapter 15 by T. Edwin Chow situates VGI in the field of Web demographics. Within this field are a whole host of Web-based systems that acquire, sort, and utilize personal data. At issue for Chow is how the field of Web demographics complicates generalizations about VGI – for instance, about the degree of voluntarism attributable to donors of geographical and personal information or the accuracy of such information. In Chap. 16, Mark H. Palmer and Scott Kraushaar employ actor-network theory (ANT) to describe a storm-tracking network that relies to a great extent on VGI. ANT presents itself as especially useful for their analysis by facilitating consideration of co-constitutive relations between society and technology. Specific to this case, ANT provides adequate flexibility for narration of both the centralized and decentralized processes on which storm reporting relies. Chapter 17 by Michael W. Dobson examines the gathering and compilation of VGI for mapping databases that, in some cases, also rely on traditional map database compilation techniques. The author reviews the promise and pitfalls of compilation systems and considers how the latter – pitfalls – might be overcome. In Chap. 18, Christopher Goranson, Sayone Thihalolipavan, and Nicolás di Tada consider the potential utility and possible pitfalls of recent advances in bringing together VGI and (public) health research. The updatability and time sensitivity of VGI are cited as central to the potential contributions. The authors also acknowledge that the use of VGI introduces new challenges – ethical and practical – particularly with regard to privacy. Chapter 19 by Thomas Bartoschek and Carsten Keßler deals with the heretofore largely neglected role of VGI in education and how it does or may transform curricula at a variety of levels, from primary education to graduate studies. The authors discuss how VGI has been introduced into classrooms and – through analysis of survey data – examine the motivations for and impediments to continued use of different VGI platforms.

In the last chapter (Chap. 20), we discuss the prospects for VGI research and its implications for GIScience and geography in the context of the fourth paradigm – data-intensive scientific inquiry.

1.5 Crowdsourcing Geographic Knowledge: From the Death of Distance to the Revenge of Geography

Back in 1995, The Economist magazine made (in retrospect) a premature announcement of the death of distance (Cairncross 1995) – the idea that distance (by implication location and more broadly geography) plays a less important role in the operation of an increasingly globally connected society in the age of rapid advances and innovations in information and communications technologies. However, it took less than 10 years before The Economist published another cover story on the theme of the revenge of geography (The Economist 2003). It turned out that, in an increasingly connected world, the precise role of distance in many societal functions may have changed, but location, and more broadly geography, has assumed a more crucial role in economic and business activities as well as in social and cultural affairs. More than ever, wireline and wireless technologies have bound the virtual and physical worlds closer (Gordon and de Souza e Silva 2011).

To us, the revenge of geography not only suggests the growing importance of location and geocoding or geotagging in the ocean of big data but also the heightened sense of and deeper appreciation for the growing divide and uneven development of an increasingly interconnected world (Hecht and Moxley 2009; Warf 2010). Situating VGI in the context of big data is only the first step to realize VGI’s much broader potential and impacts down the road. Furthermore, VGI must be placed also in the context of crowdsourcing geographic knowledge about the world. Gould (1999) anticipated the arrival of a spatial century and further argued that “there is a geographer in most people (p. 314).” Multiple new technological advances during the past two decades have indeed unleashed the potential of a geographer within everybody. The phenomenon of VGI that emerged during the first decade of the twenty-first century is one of the many manifestations of a spatial century. VGI represents an unprecedented shift in the content, characteristics, and modes of geographic information creation, sharing, dissemination, and use. To us, this is the essence of the revenge of geography in the age of Web 2.0.

Big data obviously demands big machines (in terms of both speed and storage) for us to succeed in the number crunching needed to make use of them. But more importantly, big data also demands big ideas to address the world’s big problems effectively. With the support of new cyberinfrastructure, new creative partnerships among government agencies, NGOs, industry and businesses, the academy, and citizens can be formed. It is gratifying to read the stories about Water Hackathon (waterhackathon.org). The World Bank has sponsored over 2,500 projects like this in more than 30,000 locations all over the world, and geospatial technologies have played crucial roles in all of these projects. Most recently, the World Bank has partnered with Google to make Google Map Maker’s global mapping platform available in over 150 countries and 60 different languages, which has enabled citizen cartographers to help those in dire need.Footnote 11 We all should do no less. In the chapters that follow in the book, the reader will experience an interdisciplinary perspective on how we can rely on VGI to engage in a new mode of geographic knowledge production through crowdsourcing for a more efficient, more equitable, and sustainable world – to us that will be the most gratifying result of the revenge of geography.