Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Machines are good at handling huge amounts of data, but they lack the flexibility and sensitivity of human perception when making decisions or observations. To understand human perception, we look toward what defines being human. To sense, observe, and make sense of the world around us, we combine our biological receptors (eyes, ears, etc.) with our cognitive faculties (memory, emotion, etc.). But the memory banks that we pull from to create comparative reasonings are unique from individual to individual. Thus, we each see things in slightly different ways, i.e. what is beautiful to one person may not be to another. However, there are trends that emerge among our collective human consciousness and efforts to tap a consensus of human perception, i.e. crowdsourcing, depend upon these trends to scale up analytical tasks through massively parallel networks of eyes and minds. This concept of crowd based computing has become an important approach to the inevitable “data avalanches” we face.

figure 1
Fig. 1
figure 2

Ultra-high resolution imagery of Mongolia displayed on the HiperSpace visualization facility at UC San Diego

The Modern Age of Human Information Processing: More than one quarter of the world’s population has access to the Internet (Internet World Stats 2009), and these individuals now enjoy unprecedented access to data. For example, there are over one trillion unique URLs indexed by Google (Google Blog 2008), three billion photographs on Flickr, over six billion videos viewed every month on YouTube (comScore 2009), and one billion users of Facebook, the most popular social networking site. This explosion in digital data and connectivity presents a new source of massive-scale human information processing capital. User generated content fills blogs, classifieds (www.craigslist.org), and encyclopedias (www.wikipedia.org). Human users moderate the most popular news (www.reddit.com), technology (www.slashdot.org), and dating (www.plentyoffish.com) sites. The power of the internet is the power of the people that compose it, and through it we are finding new ways to organize and connect networks of people to create increasingly powerful analytical engines.

Breaking up the Problem: To combine the large-scale strength of online data collection with the precision and reliability of human annotation, we take a creative approach that brings the data collection process close to humans, in a scalable way that can motivate the generation of high quality data. Human computation has emerged to leverage the vast human connectivity offered by the Internet to solve problems that are too large for individuals or too challenging for automatic methods. Human computation harnesses this online resource and motivates participants to contribute to a solution by creating enjoyable experiences, appealing to scientific altruism, or offering incentives such as payment or recognition. These systems have been applied to tackle problems such as image annotation (von Ahn and Dabbish 2004), galaxy classification (www.galaxyzoo.org), protein folding (Cooper et al. 2010), and text transcription (von Ahn et al. 2008). They have demonstrated that reliable analytics can produced in large scales through incremental contributions from parallel frameworks of human participantion.

One approach to human computation motivates participants by creating enjoyable, compelling, engaging games to produce reliable annotations of multimedia data. Markus Krause’s chapter (in this book) on gamification provides a brilliant investigation of this specific topic. These “games with a purpose” (von Ahn 2006) have been applied to classify images (von Ahn and Dabbish 2004; von Ahn 2006), text (von Ahn et al. 2006) and music (Mandel and Ellis 2007; Barrington et al. 2012b; Law and vonAhn 2009). In general, these games reward players when they agree on labels for the data and, in turn, collect information that the consensus deems reliable. The goal of these games has been to collect data on such a massive scale that all the available images, text or music content could be manually annotated by humans. Although simple and approachable online games – “casual games” – have broadened the video gaming demographic (International Game Developers Association 2006), designing a human computation game that meets these data collection goals while being sufficiently attractive to players in massive volumes remains a challenge.

In this chapter we describe several efforts to produce game like frameworks that take on a needle-in-a-haystack problems, often when the needle is undefined. Specifically, we explore innovative networks of human computation to take on the ever expanding data challenges of satellite imagery analytics in search and discovery. We describe frameworks designed to facilitate peer directed training, security through the partitioning and randomization of data, and statistical validation through parallel consensus. In each case it is clear that careful architecture of information piping is a determinate in the success of parallel human computation. We begin with an overview of our initial efforts in satellite remote sensing for archaeology, followed by subsequent experiences in disaster assessment, and search and rescue.

Case Study: Archaeological Remote Sensing

In 2010 we launched “Expedition: Mongolia” as the satellite imagery analytics solution for the Valley of the Khans Project (VOTK), an international collaboration between UC San Diego, the National Geographic Society, and the International Association for Mongol Studies to perform a multidisciplinary non-invasive search for the tomb of Genghis Khan (Chinggis Khaan). We turned to massively parallel human computation out of frustration from the inability to effectively survey the vast quantity of imagery data through automated or individual means.

Since the invention of photography, aerial images have been utilized in archaeological research to provide greater understanding of the spatial context of ground features and a perspective that accentuates features which are not otherwise apparent (Riley 1987; Bewley 2003; Deuel 1969; Lyons 1977). Buried features can produce small changes in surface conditions such as slight differences in ground level, soil density and water retention, which in turn induce vegetation patterns (cropmarks), create variability in soil color (soilmarks) or even shadows (shadowmarks) that can be seen from above.

The introduction of earth sensing satellites has further contributed to the integration of remote sensing in archaeology (Fowler 1996; Parcak 2009). The ability of detecting features on the ground from space is largely dependent upon the ratio of feature size to data resolution. As sensor technologies have improved, the potential to utilize satellite imagery for landscape surveys has also improved (Wilkinson et al. 2006; Lasaponara and Masini 2006; Blom et al. 2000). In September of 2008 the GeoEye-1 ultra-high resolution earth observation satellite was launched by GeoEye Inc. to generate the world’s highest resolution commercial earth-imaging (at the time of launch) (Madden 2009). Generating 41 cm panchromatic and 1.65 m multispectral data this sensor further expanded the potential of satellite based archaeological landscape surveys. However, the massive amount of data that is collected each day by these sensors has far exceeded the capacity of traditional analytical processes. Thus, we turn to the crowds to scale human computation towards a new age of exploration.

We construct a massive parallel sampling of human perception to seek and survey the undefined. Specifically, we aim to identify anomalies in vast quantities of ultra-high resolution satellite imagery that represent archaeological features on the ground. Because these features are unknown we are searching for something we cannot predefine. Our internet-based collaborative system is constructed such that individual impact is determined by independent agreement from the “crowd” (pool of other participants who have observed the same data). Furthermore, the only direction that is provided to a given participant comes from the feedback in the form of crowd generated data shown upon the completion of each input. Thus, a collective perception emerges around the definition of an “anomaly”.

The Framework

Ultra-high resolution satellite imagery covering approximately 6,000 km2 of landscape was tiled and presented to the public on a National Geographic websiteFootnote 1 through a platform that enabled detailed labeling of anomalies.

Within the data interface participants are asked to annotate features within five categories: “roads”, “rivers”, “modern structures”, “ancient structures”, and “other”. For each image tile, participants were limited to create no more then five separate annotations. This limitation was designed to limit the influence that any single individual could have on a given section of imagery (see Fig. 2).

Fig. 2
figure 3

User interface for online participants to identify anomalies within randomly presented sub-sectioned satellite imagery (Presented on http://exploration.nationalgeographic.com/mongolia)

Image tiles (with georeference meta data removed) were distributed to participants in random order. By providing segmented data in random order a collection of participants (or participant with multiple registrations) could not coordinate a directed manipulation of any given location. This was designed to both secure the system against malicious data manipulation as well as to protect the location of potential sites from archaeological looters.

At the onset of the analysis, ground truth information did not exist to provide an administrative source of feedback of the accuracy of analysis to participants. Thus we depend upon peer feedback from data previously collected by other randomly and independent observers of that image tile to provide a consensus based reference to position ones input in relation to the “crowd” (see Fig. 3).

Fig. 3
figure 4

Peer based feedback loop (Presented on http://exploration.nationalgeographic.com/mongolia)

The semi-transparent feedback tags provide a reference to gauge one’s input to the perceptive consensus of a crowd. This reference information cannot be used to change the input provided to that particular image tile, however is designed to influence the participant for the following image tiles. Basing training on an evolving peer generated data set we allow a form of emergent collective reasoning to determine the classifications, an important design element when searching for something that cannot be predefined.

The emergence of “hotspots” of human agreement also provide a form of validation through agreement among independent observers (a multiply parallel blind test). The mathematical quantification of agreement is the basis for extracting insight from the noisy human data. A detailed investigation of this framework and the role of collective reasoning will be reported in a forthcoming manuscript (Lin et al. 2013).

Opening the Flood Gates

Since its launch over 2.3 million annotations from tens of thousands of registered participants were collected. Recruitment was facilitated through public media highlights, i.e. news articles and blogs. These highlighting events provide observable spikes of registration/participation, as seen in Fig. 4. We show this trend to articulate the importance of external communities to drive participation in crowdsourced initiatives.

Fig. 4
figure 5

Registration (blue) and image view (red) statistics across the duration of the experiment

Overlaying this huge volume of human inputs on top of satellite imagery creates a complex visualization challenge (Huynh et al. 2013) a subset of which is depicted in Fig. 5. While independently generated human inputs are inherently noisy, clusters of non-random organization do emerge. Categorical filtering highlights road networks, rivers, and archeological anomalies, respectively.

Fig. 5
figure 6

Human generated tags overlaid on satellite imagery showing emergent agreement around features. Tag categories “road” and “ancient” are represented in red and yellow, respectively. We have explored methods of clustering to define linear features through tags (roads and rivers, Huynh and Lin (2012))

Guided by this global knowledge of public consensus, we launched an expedition to Mongolia to explore and groundruthed locations of greatest convergence (defined mathematically through kernel density estimations). From three base camp locations along Mongolia’s Onon River Valley we were restricted to a proximity boundary based upon 1 day’s travel range and limitations associated with extreme inaccessibility. This created an available coverage distance of approximately 400 square miles. Within these physical boundaries we created and explored a priority list of the 100 highest crowd rated locations of archaeological anomalies. The team applied a combination of surface, subsurface geophysical (ground penetrating radar and magnetometry), and aerial (UAV based) survey to ground truth identified anomalies (Lin et al. 2011). Of those 100 locations, over 50 archaeological anomalies were confirmed ranging in origins from the Bronze age to the Mongol period (see example in Fig. 6).

Fig. 6
figure 7

Rectangular burial mound (identified through our human computation network) from early to late Bronze Age origins (Allard and Erdenebaatar 2005; Jacobson-Tepfer et al. 2010)

Case Study: Christchurch Earthquake Damage Mapping

Born out of the success of “Expedition:Mongolia” Tomnod Inc. was formed in 2011 to explore broader application of human computation in remote sensing. While search targets varied, the computation challenge was consistent. The methodology of large scale human collaboration for earth satellite imagery analytics was quickly applied in the aftermath of a 6.3 magnitude earthquake that devastated the city of Christchurch, New Zealand in February 2011.

Once again, a website was developed to solicit the public’s help in analyzing large amounts of high-resolution imagery: in this case 10 cm aerial imagery (Barrington et al. 2012a). Users were asked to compare imagery taken before and after the quake and to delineate building footprints of collapsed or very heavily damaged buildings. The interface was designed to be simple and intuitive to use, building on widespread public familiarity with web-mapping platforms (Google Maps, Google Earth, Bing Maps, etc.), so that more of the user’s time is spent analyzing data versus learning how to use the interface. Using a simple interface that runs in a web browser, rather than an ‘experts-only’ geographic information system (GIS) platform, opens the initiative to a larger group of untrained analysts drawn from the general Internet public (Fig. 7)

Fig. 7
figure 8

Tomnod Disaster Mapper Interface in the Christchurch GEOCAN effort

After just a few days, thousands of polygons outline areas of damage were contributed by hundreds of users. The results are visualized in Fig. 8 below where areas of crowd consensus can be clearly identified by densely overlapping polygons. The crowd’s results were validated by comparison to ground-truth field surveys conducted in the days immediately following the earthquake. The field surveys marked buildings with red (condemned), yellow (dangerous) or green (intact) tags, indicating the level of damage. Ninety-four percentage of the buildings tagged by the crowd were actually reported as damaged (red or yellow) by the field survey (Foulser-Piggott et al. 2012).

Fig. 8
figure 9

Results of the crowd-contributed damage outlines and highlights of two destroyed buildings. Red = completely destroyed, orange = heavy damage, yellow = light damage

Case Study: Peru Mountain Search & Rescue

The previous case studies demonstrated the capability of large networks of distributed human analysts to identify undefined features and apply visual analytics to remote sensing datasets on a massive scale. The final application of crowdsourced remote sensing we discuss highlights the timeliness that can be achieved when hundreds of humans help search through imagery and rapidly identify features of interest. On July 25, 2012, two climbers were reported to be lost in the Peruvian Andes. Missing in a remote, inaccessible region, the fastest way for their friends in the US to help find them was to search through satellite images. DigitalGlobe’s WorldView-2 satellite captured a 50 cm resolution image and, once again, Tomnod launched a crowdsourcing website to facilitate large scale human collaboration. Friends, family and fellow climbers scoured the mountain that the climbers were believed to have been ascending. The crowd tagged features that looked like campsites, people, or footprints and, within hours, every pixel of the entire mountainside had been viewed by multiple people (Fig. 9).

Fig. 9
figure 10

Comprehensive crowdsourcing maps an entire mountain in just a few hours. The crowd identified possible footsteps (orange), people (green), campsites (blue) and avalanche regions (red)

One of the first features identified within just 15 min of launching the website showed the 3-man rescue team making their way up the glacier in search of the climbers. Over the next 8 h, consensus locations were validated by experienced mountaineers and priority locations were sent to the rescue team on the ground (e.g., footprints in the snow, Fig. 10).

Fig. 10
figure 11

Fresh foot tracks in the snow outlined through crowdsource analytics of near real time ultrahigh resolution satellite imagery

The search ended the next morning when the climbers bodies were discovered where they had fallen, immediately below the footprints identified by the crowd. While this case study has a tragic ending, the story highlights the power of human collaboration networks to search a huge area for subtle clues and, in just a few hours, go from image acquisition to insight. Furthermore, we observe that in times of need, humans want to help, and when channeled in appropriate collaborative pipelines can do so through computation.

Next Step: Collaborating with the Machine

While we have shown three examples of scalable human analytics, it would be a challenge for human computation alone to analyze every image on the web, every galaxy in the sky or every cell in the human body. However, human computation systems can produce well-labeled examples in sufficient volume to develop machine learning methods that can tackle such massive problems autonomously (Barrington et al. 2012b; Snow et al. 2008; Novotney and Callison-Burch 2010). By integrating machine intelligence systems with human computation, it is possible to both focus the human effort on areas of the problem that can not yet be understood by machines and also optimize the machine’s learning by actively querying humans for labels of examples that currently confound the machine.

The detection of anomalies within an image is a difficult problem: we know that they may be located in regions of the image, but we don’t know exactly where. We believe the application of multiple instance learning (Babenko et al. 2006; Maron and Lozano-Pérez 1998; Maron and Ratan 1998; Zhang et al. 2002) would be best suited for the problem at hand. Unlike the classical approach to learning, which is based on strict sets of positive and negative examples, multiple instance learning uses the concept of positive and negative bags to address the nature of fuzzy data. Each bag may contain many instances, but while a negative bag is comprised of only negative instances, a positive bag is comprised of many instances which are undetermined. While there may be negative examples in the positive bag due to noisy human input, the majority the positive examples will tend to lie in the same feature space, with negative examples spread all over. Multiple instance learning is able to rely on this insight to extrapolate a set of features that describes the positive bag. This is very appropriate for our data since a single image patch may contain many alternative feature vectors that describe it, and yet only some of those feature vectors may be responsible for the observed classification of the patch. A schematic of a proposed workflow for combining human computation and multiple instance learning (a machine based method) is outlined in Fig. 11.

Fig. 11
figure 12

Three phase approach to combine machine learning with search and discovery human computation: consensus convergence; feature extraction; and multiple instance learning

If we are able to pool human perception to identify and categorize hard to define anomalies, we can begin applying this approach. From each of the many instances in a given category bag (i.e. ancient structure) we extract a set of image feature vectors. Since not every instance in the bag truly represents the labeled concept, some of these features will describe random image details, while others may be drawn from an actual ancient structure and will, for example, exhibit a certain rectangular shape. As we iterate through all the instances in multiple bags, the aim is that the features that describe an anomaly will become statistically significant. As the signal from multiple positive instances emerges from the uniformly distributed background noise, we can identify the features that best describe certain classes of anomaly. Thus even with multiple, noisy, weakly-labeled instances from our training set, applying multiple-instance learning will result in a set of features that describe each anomaly and which we can apply to new data to find anomalies therein.

Conclusions

The idea of collecting distributed inputs to tap the consensus of the crowd for decision making is as old as the democratic function of voting, but in this digital age, networks of individuals can be formed to perform increasingly complicated computational tasks. Here, we have described how the combined contribution of parallel human micro-inputs can quickly and accurately map landscapes and features through collective satellite imagery analytics.

In “Expedition:Mongolia” we designed a system of peer based feedback to define archaeological anomalies that have not been previously characterized, to leverage a collective human perception to determine normal from abnormal. Participants without pre-determined remote sensing training were able to independently agree upon image features based on human intuition, an approach avails of the flexibility and sensitivity of human perception that remains beyond the capability of automated systems. This was critical in our search for the “undefined needle in a haystack”.

While this initial effort focused on an archaeological survey, applications of crowdsourced remote sensing exist across domains including search & rescue and disaster assessment. This was demonstrated through the efforts of Tomnod Inc., a group born out of the experiences in Mongolia to tackle the data challenges of the commercial satellite imaging industry through crowdsourced human computation. In the Christchurch disaster mapper effort we observe a remarkable 94 % accuracy to ground truth. This result opens new possibilities for human computation and remote sensing in the assessment and ultimately recovery of disaster events. The Peruvian Mountain search & rescue operation demonstrated the remarkable speed with which insight could be gained from pooling human effort for large scale data analytics, suggesting that a combination of networked human minds and fast data pipelines could actually save lives.

Each example demonstrates the potential of online communities to mine unbounded volumes of digital data and catalyze discovery through consensus-based analytics. We have shown how human perception can play a powerful role when seeking unexpected answers in noisy unbounded data.

However, while our approach depends upon emergent trends of agreement as the validating principle of actionable information, we observe this inherently does not capture the value of outliers (independent thinkers). Future work may identify mechanisms to reward “out of the box” thinking, possibly through a more detailed understanding and utilization of the individual human variables that contribute to a distributed human computation engine.

Finally, we observe that the natural next step in the evolution of human centered computation will be the collaboration between human and automated systems. This synergy will likely be required as we face the increasingly overwhelming data avalanches of the digital world.