Introduction

The seeds of a data revolution in the study of interactions between human society and the environment were sown in the 1960s and 1970s as Earth observation (EO) satellites orbiting the planet began to amass volumes of global data on land cover. Researchers began to use these new data to answer new sets of research questions about human-induced changes in the landscape, such as urbanization, deforestation, and crop production. The data were also used to gain insight into the environmental drivers of problems for human societies, such as drought, famine, and disasters. In the early 1990s, as a research community began to coalesce around human–environment interactions, a Human Dimensions of Global Change (HDGC) Committee was established under the National Research Council. At the same time NASA was working to expand the community of Landsat users to demonstrate the utility of EO data. NASA sponsored a workshop through the HDGC Committee to explore the current state of the connection between remote sensing and the social sciences and illustrate applications for Landsat data to advance empirical and theoretical understanding in social science domains. The primary legacy of that workshop was the publication of the classic volume, People and Pixels: Linking Remote Sensing and Social Science (Liverman et al. 1998), which compiled, synthesized, and reflected on the early stages of this data revolution.

Twenty years later, we are several steps into a second data revolution marked by the wide availability of environmental and social data at finer spatial and temporal resolutions, analysis-ready data products, and new analytical and computational tools. Researchers from a wide range of disciplines are leveraging these technical advances to pursue a new generation of research on coupled human–environment interactions, linking people and pixels to address new matters of global concern. In the early stages of People and Pixels’ second generation, we see value in revisiting and updating the original hypotheses, exploring continuing limitations, and considering the implications of such progress for the future. The purpose of this article is to identify and document advances in data, software, coding, and processing that continue to facilitate the connection between people and pixels and to highlight empirical work at the intersection of people and pixels that advances our understanding of human–environment interactions. This article is primarily targeted to demographers, sociologists, economists, and other social scientists who are interested in incorporating environmental data into their work to explore human–environment interactions.

We organize this article into three main sections. In the first section, we highlight changes in the data landscape from 1998 to the contemporary period. We discuss advances in remote sensing, the emergence of large-scale in situ data gathered by mobile devices, and the growing class of derived data products for a range of topics of interest to social scientists. In the second section, we discuss tools and software that are making this profusion of data more readily accessible and applicable. In the third section, we present examples and case studies of social science research areas demonstrating applications of remotely sensed data. Within these examples, we first explore how remotely sensed data are used to measure the context of social phenomena, and then we address how remotely sensed data are being used to measure social phenomena and their effects.

Changes in the data landscape

Advances in remote sensing

Over the course of the past 20 years, space-based remote sensing has evolved dramatically, and innovation continues to proceed ever-more rapidly (Mathieu and Aubrecht 2018). Systematically collected, geographically comprehensive, and rigorously calibrated EO data serve as a crucial backbone for analytics of population–environment interactions.

The Landsat imagery that prompted People and Pixels now represents the beginning of a long-term archive of moderate resolution imagery. Landsat data now encompass more than 40 years of relatively consistent data at 30 to 60 m resolution, enabling investigations of change stretching over multiple decades. Examples of studies taking advantage of this long-term data archive include annual tracking of forest cover (Hansen et al. 2013) and urban development (Li et al. 2015; Sexton et al. 2013), assessing land use impacts on water quality (Olmanson et al. 2008; Zheng et al. 2015), tracking the types of land cover that are converted to agricultural use (Gibbs et al. 2010), and characterizing fire events and post-fire succession (Röder et al. 2008). Landsat data are complemented by data from the Advanced Very High Resolution Radiometer (AVHRR) and Moderate Resolution Imaging Spectroradiometer (MODIS) programs, which also span multiple decades. While Landsat has revisit periods of 8 to 16 days, AVHRR and MODIS generate data with coarser spatial resolution (250 m to 1 km), but with revisit periods of a day or less. The higher frequency data more readily capture rapidly changing phenomena such as wildfire, floods, and other hazards.

As satellite and sensor technology has advanced, much higher spatial resolution imagery has become commonplace. Commercial satellites such as SPOT, QuickBird, IKONOS, and the WorldView sensors have been capturing imagery with 3-m down to 30-cm resolution and 1- to 3-day revisit periods since the turn of the millennium. In the public realm, the European Copernicus program’s Sentinel satellite constellation, launched between 2014 and 2017, provides both radar and optical data at 10-m resolution with approximately 5-day revisit periods. All Copernicus data are free and open under the joint European Union/European Space Agency data policy. The combination of high spatial resolution and frequent revisit times, as well as availability of both passive and active sensor technology, enables monitoring of local- to global-scale trends and dynamics in land cover, supporting the identification of anthropogenic impact on the environment and contextual features that influence humans. For example, such imagery is being used to examine changes in the landscape in conflict zones to assess the direct and indirect consequences of civil wars (see “Measuring social phenomena and their effects” section).

Since the mid-2010s, microsatellite constellations have gained considerable commercial attention. By deploying swarms of small satellites on a regular basis, data providers (e.g., Planet Labs) maintain networks of many dozens of high-resolution (0.5–5 m) sensors that together can provide daily, global coverage. Because microsatellites are less expensive to produce and launch than traditional satellites, many new operators continue to come online, including some with hyperspectral (e.g., Satellogic) and radar (e.g., Iceye) capabilities. The competition may serve to reduce the cost barrier for research applications of imagery from these commercial sources. Belward and Skøien (2015) provide a comprehensive listing of civilian EO satellite launches and operating systems, and Denis et al. (2017) describe new innovations and disruptive technology in EO systems and associated markets.

Innovations on the horizon include space-based video and high-altitude pseudo satellites (HAPS). Organizations such as Skybox Imaging, UrtheCast, and Earth-i have begun providing full motion HD video from space (Earth-i provides 2-min color video clips). HAPS are large drones able to stay in the stratosphere (~ 20 km altitude) over a fixed point on Earth for several months, providing platforms for telecommunication, navigation, and remote sensing. HAPS would therefore be able to provide very high-frequency observations for an area of interest. While the concept of HAPS has been around for decades, operational systems appear to have come within reach only recently (d’Oliveira et al. 2016). Both space-based video and HAPS are generating interest in commercial and operational domains, for applications such as real-time traffic and pedestrian monitoring (Esch et al. 2017). The potential for these technologies to contribute to human–environment interactions research, however, remains largely unexplored.

With these advances in remote sensing, the ultimate goal of a “digital Earth,” real-time continuous monitoring at very high resolutions, seems almost within reach, and may prove to be beneficial in a range of research domains involving populations and their activities across various spatial and temporal scales (Aubrecht et al. 2015). The very high spatial and temporal resolution of these data, potentially combined with data from in situ sensor networks, may eventually “even [allow] for the monitoring of living species” (Association of European Space Research Establishments 2017, p. 9). Social scientists have not yet fully incorporated very high-resolution data into their work, in part due to remaining barriers in terms of commercial data costs, complex data processing, and extracting information of use to social scientists from such data. However, the increasing availability of free or low-cost high-resolution imagery, accessibility of distributed and high-performance computing, new analysis toolkits, and derived data products may well enable very high-resolution data to drive a second phase in the people and pixels revolution.

Derived/modeled data products

Derived and modeled data products address the challenge of extracting information of use to social scientists from raw EO data products. While remotely sensed data capture physical properties of the Earth, they do not directly capture phenomenon of interest to social scientists, such as land cover change, vegetation condition, or the distribution and characteristics of human populations. Derived and modeled products use remotely sensed data, typically in combination with other data sources such as agricultural or population censuses, as inputs into a model, classification process, or other workflow. The outputs are analysis-ready gridded representations of social or biophysical properties of interest. Below we describe key advances in the development of data products notable for their potential to advance knowledge about human–environment interactions. We also discuss the uncertainty inherent in modeled data, which researchers using these data products must keep in mind when drawing conclusions.

Modeled environmental products

Over the past 20 years, a variety of modeled environmental products have emerged. These products use remotely sensed data as a model input, potentially integrated with other data, to provide pixel-based measures of environmental features such as forest cover, surface water, and air quality that are important links in the human–environment interface. General land cover classification products such as Global Land Cover 2000 (Bartholome and Belward 2005) and GLOBCOVER (Arino et al. 2007) form the foundation for much of this work, and these products continue to be refined and produced at higher resolutions (Jun et al. 2014). A more recent generation of products focuses on specific types of land cover or land use, such as forest cover (Hansen et al. 2013; Pengra et al. 2015), agricultural land use (Ramankutty et al. 2008), and urban land use (Esch et al. 2014; Pesaresi et al. 2013). Another class of products draws on additional human activity data to map human alterations or pressures on the landscape (Ellis et al. 2013; Sanderson et al. 2002; Venter et al. 2016). Finally, several recent products consider nonterrestrial conditions such as air quality (Geddes et al. 2017; van Donkelaar et al. 2016, 2018) and surface water dynamics (Pekel et al. 2016).

While a comprehensive review of these data products is beyond the scope of this paper, it is important to recognize that such products provide significant new data streams for linking people and pixels and studies at the human–environment interface. The fact that many of these products are GIS-ready also promotes their use by a wider community of researchers, and represents an important development since the publication of People and Pixels, when remotely sensed data processing was the province of a select group of experts. Modeling, however, brings risks that uncertainties in the data may not be fully understood by those seeking “plug and play” data. It behooves users to read dataset documentation fully to understand fitness for use.

Gridded population data

As noted in the original People and Pixels volume, one contribution of remote sensing to social science is to extend the reach of social science data. Social science data based on censuses and surveys are limited in their spatial and temporal resolution because the spatial representation typically relies on administrative units and data are gathered only periodically, such as the decadal time frame common for censuses. They may also exclude hard to reach populations or locations near potential conflict zones where safety of the enumerators might be at risk. Countries where population data are not collected via censuses or registries, or where data are not made available, are especially challenging to incorporate into quantitative social science research. The wall-to-wall coverage, high spatial resolution, and relatively frequent observations of remotely sensed data can be leveraged to fill some of these gaps.

Gridded population data are typically created by starting with national census data linked to administrative boundaries, then distributing the population in each unit across grid cells falling within the unit. Techniques used to spatially distribute the population within administrative units span a spectrum from simple areal weighting methods to more complex techniques that utilize ancillary data to allocate population in more “realistic” patterns (Leyk et al. 2019). The areal weighting approach equally distributes the total count tied to a census unit across all grid cells within the boundary of the unit (Doxsey-Whitfield et al. 2015). Slightly more sophisticated is a pycnophylatic approach—an approach which weights the redistribution of census population data to avoid abrupt discontinuities at boundaries between census units (Tobler 1979). Dasymetric mapping approaches rely on ancillary data, such as land cover, to disaggregate census counts (Balk et al. 2006; Eicher and Brewer 2001; Leyk et al. 2019). Starting with data like satellite-derived urban/rural land cover distributions (Balk et al. 2006), many current techniques use a variety of ancillary data, mainly derived via remote sensing (Sorichetta et al. 2015; Stevens et al. 2015).

Dasymetric mapping approaches can generally be categorized as binary, weighted or modeled, and hybrid. Binary dasymetric techniques rely on ancillary data to divide pixels into those that are populated and those that are not (Mennis and Hultgren 2006). Typically, settlement data, derived from remotely sensed land cover and/or nighttime lights data, are used to identify grid cells that are likely to be populated. The census-based population counts are distributed evenly among those cells, with no population allocated to cells that are unlikely to be populated. A variety of spatially detailed (10–50-m resolution) settlement layers with global coverage are now available (Esch et al. 2013; Pesaresi et al. 2013). Weighted and modeled techniques define a range of population density classes, often based on multiple sources of ancillary data (e.g., elevation, land cover, nighttime lights). Statistical models are used to assign weights to each population density class, and the census-based population is distributed according to these weights. Hybrid approaches may rely on a statistical weighting layer combined with a binary constraint (Stevens et al. 2015). For example, an analyst could assign population only to built areas and create a weighting layer based on a statistical model to allocate population within those areas (Schroeder 2017).

Several recent developments have returned to binary dasymetric techniques, emphasizing improvements in the creation of settlement data, rather than complex statistical approaches. Previously, imagery-based settlement data had limitations, especially in sparsely settled areas in regions where buildings are made of the same materials as surrounding landscapes and are therefore difficult to detect, or where nighttime lights emissions are too sparse or of low intensity to detect. Increased image resolution combined with machine learning techniques has addressed these limitations, enabling creation of settlement data that identify individual buildings (Facebook Connectivity Lab and Center for International Earth Science Information Network - CIESIN - Columbia University 2016). Other approaches leverage both optical and radar-based data to create more accurate settlement layers (see World Settlement Footprint (WSF) for example).

Some gridded population products go beyond population at place of residence to represent temporal dynamics of human mobility, including diurnal, weekly, and seasonal population fluctuations due to commuting patterns and travel and tourism flows. Daily movement patterns have been addressed in population models since about the early 2000s (McPherson et al. 2004). The first large-scale country-wide model was produced for the USA (LandScan USA), containing both a nighttime residential as well as a baseline daytime population distribution incorporating movement of workers and students (Bhaduri et al. 2007). Daytime–nighttime maps have proven to be particularly important in disaster risk applications to address exposure dynamics and evacuation needs for rapid-onset hazardous events such as earthquakes, tsunamis, and flash floods (Aubrecht et al. 2014; Freire et al. 2013) or terrorist incidents (Ahola et al. 2007). Moving beyond case-study–specific application to a full conceptual modeling and processing framework for spatiotemporal population mapping, Martin et al. (2015) developed SurfaceBuilder247, which incorporates both daily and seasonal movements. Most recently, daytime/nighttime grids have been developed under the European Commission’s ENACT project for all 28 countries of the European Union, considering the presence of residents, workers by sector, students, and tourists, along with the locations of residence and activity (Batista e Silva et al. 2018).

Given the variety of gridded population products available, it is becoming increasingly important for users to carefully consider which product is best suited for their requirements, necessitating extensive source and metadata information.Footnote 1 Metadata should help researchers assess the quality (e.g., recency and granularity) of the census data and attribute accuracy of other ancillary data. The metadata should also provide transparent information on the locational accuracy of ancillary data sources to help researchers determine if data products are commensurate with their scale of interest. Another significant challenge for producers of gridded population data is establishing an appropriate means of validating final model outputs. The nature of attempting to fill a gap between census and remotely sensed data means that reliable, accurate data are seldom available against which to assess accuracy and model fit.

Poverty mapping

Gridded population data provide spatial distributions of the human population as a whole, but generally do not provide information on population characteristics beyond age and sex distributions (e.g., Center for International Earth Science Information Network 2017). Recent efforts have begun exploring the use of remotely sensed data to map population characteristics or particular segments of the population (Bosco et al. 2017; Doxsey-Whitfield et al. 2015). Poverty mapping, in particular, is of interest because poverty data from national censuses and household surveys are often of poor quality, lack spatial detail, or are altogether absent. Between 2002 and 2011, as many as 57 developing countries failed to conduct more than a single survey capable of producing poverty statistics, and data are the most scarce in the poorest countries (Serajuddin et al. 2015).

The global coverage and high temporal frequency available from remotely sensed data makes them attractive as a potential supplement to fill the gaps in spatial poverty data. While remotely sensed data cannot measure poverty directly, they can provide indicators of variations in the level of development over space. Nighttime lights (NTL) imagery has been used to describe variations in poverty at national and continental scales (Ghosh et al. 2013; Henderson et al. 2012; Pinkovskiy and Sala-i-Martin 2016). However, the coarse spatial resolution of NTL data (750 m for VIIRS DNB; 1 km for DMSP-OLS data) limits their ability to assess variability at local spatial scales.

Researchers are now exploring the use of high-resolution imagery (spatial resolution less than 5 m) to map poverty both within urban areas by identifying slum versus nonslum areas and for entire countries. This research is based on the assumption that spatial patterns of buildings and roads discernable from satellite imagery can characterize slums and poverty levels. A number of approaches to mapping slums have been tested, including simple visual interpretation; object-oriented approaches; machine learning based on spectral and spatial features; and others (Engstrom et al. 2015; Kohli et al. 2016; Kuffer et al. 2016; Taubenböck and Kraff 2014). Jean et al. (2016) used high spatial resolution data to map poverty across multiple sub-Saharan African countries at the village level using a convolutional neural network approach. Methods currently under development incorporate spatial and spectral features from the field of computer vision to detect areas with potential indicators of poverty (Engstrom et al. 2016; Graesser et al. 2012; Sandborn and Engstrom 2016). Rather than mapping objects (e.g., buildings, roads, cars), the spatial features approach characterizes variability in the image related to spatial pattern, structure, orientation, texture, and irregularity (Herold et al. 2003) that can be related to poverty (Engstrom et al. 2017a, 2017b).

Caveats for modeled data

Modeled and derived data products are attractive because they provide ready access to variables of interest, such as land cover, precipitation, and human population distribution, with complete spatial coverage. However, because these data products are not based on direct measurements of these variables at every point in space, they are inherently uncertain. Uncertainties are introduced when combining multiple sources of data, interpolating between locations or times, modeling relationships, applying machine learning, and any other approaches that essentially make educated guesses about variables of interest based on observable variables.

Uncertainty is present in all types of modeled data products, and different products that nominally describe the same variables inevitably differ from each other. For example, in classified land cover maps, the classification of a pixel at a given location may not be consistent from one product to another, particularly in transitional zones or for mixed-type classes (Congalton et al. 2014). Among the many gridded population products, the population density estimated for a given location may vary as a result of the particular input data, ancillary variables considered, and the estimation methods applied (Leyk et al. 2019).

Due to these uncertainties, users of modeled data products are well-advised to treat them with an appropriate degree of caution. Users should be sure to review documentation and published discussion of uncertainty for any modeled data products and consider the implications of such uncertainty for their analysis. When multiple products representing the same variable are available, users should carefully consider which product is best fit for their use case (Leyk et al. 2019). Ideally, research incorporating modeled data products should include some form of uncertainty or sensitivity analysis to evaluate the validity of conclusions in light of data uncertainties. At a minimum, researchers should be careful not to consider modeled data products as absolute truths.

In situ data from mobile devices

While advances in satellite and airborne sensors are increasing the spatial and temporal resolution of remotely sensed data, another data revolution is underway on the ground. The introduction and rise to near-ubiquity of mobile devices over the past 20 years has generated entirely new streams of in situ data. In 2000, there were fewer than 700 million mobile phone subscriptions, and now there are more cell phones in the world than people. The World Bank (2016) has estimated that by 2020 there will be over 50 billion Internet-connected devices with the most significant growth and impact in the developing world. According to a survey by the Pew Research Center in the USA, while 65% of respondents had cell phones in 2004, this figure increased to 92% by 2015 (Anderson 2015). While mobile devices are most prevalent in the developed world, the developing world is rapidly adopting mobile technologies, and there is global recognition of the opportunity afforded by data produced by these devices.

Mobile devices are able to record the time and their location via GPS sensors or triangulated from the cellular towers or Wi-Fi networks with which they communicate. Therefore, observations based on mobile device activity are easily tagged with timestamps and spatial coordinates. Data may be gathered from mobile devices either intentionally by their users, for example through organized citizen science projects, or incidental to other mobile device usage, such as social media posts and cell phone calls. Such data have proven useful for assessing post-disaster situations (Cervone et al. 2016; Horanont et al. 2013; Lu et al. 2016a) and are increasingly being used to understand population flows such as daily commuting, travel, and mobility patterns (Mocanu et al. 2013; Wood et al. 2011).

Citizen science projects for environmental monitoring make use of “citizens as sensors” to intentionally collect data through crowdsourced observations (Hultquist and Cervone 2018). The data are stored and quickly disseminated using social networks or dedicated services and made available to study specific phenomena. Such data can be particularly valuable during and after natural disasters, when official sensors are not available or are unable to provide detail commensurate with rapid changes over time and space (Tidball and Krasny 2012). Ongoing work is developing techniques to incorporate these data into spatial analysis frameworks and integrate them with other types of spatial data (de Albuquerque et al. 2015; Schade et al. 2013). For example, images of flooded areas collected by citizens can be combined with remotely sensed data to determine accurate and detailed inundation boundaries (Panteras and Cervone 2018; Sava et al. 2017).

Data derived from mobile device usage also have the potential to increase the spatial and temporal resolution of data on human activities. Established sources of population data such as censuses and surveys have limited spatial and temporal resolution, typically describing resident populations in terms of administrative units and, at best, annual time steps. In contrast, data gleaned from traces of mobile device activity, such as locations of cell phone calls, text messages, or social media posts, can record specific movements and engagements with precise geographic coordinates and timestamps. These data therefore have the potential to extend population dynamics mapping toward near real-time representation of human mobility (Aubrecht et al. 2018; Deville et al. 2014). Unlike citizen science observations and other explicitly volunteered geographic information, mobile device users have not intentionally created these data points for the primary purpose of generating observations on users’ location through time. Such data are nonetheless captured by service providers and can be made available to researchers (Lu et al. 2016a). Some social media data sources, such as Twitter and Foursquare, include publicly available text or image content, which are also tagged with time and location (Patel et al. 2017; van Zanten et al. 2016).

However, using these incidental data for research carries challenges. While very widespread, access to and use of mobile devices are not universal. Connectivity can be limited by environmental and situational conditions (Lu et al. 2016b), and social media usage may be concentrated among particular population subgroups (Cesare et al. 2018). In addition, because data derived from mobile devices are incidental to other activities and not originally designed for research use, the data are unlikely to follow consistent sampling patterns, which presents challenges for removing noise and appropriately analyzing the data (Aubrecht et al. 2017). Another challenge for researchers with limited budgets is that access to mobile device data often requires proprietary data licensing agreements. Finally, ethical and legal standards on the use of data from mobile devices are still evolving. Even if the data are collected with users’ explicit consent, researchers should evaluate ethical implications and ensure that individuals are not identified (Mikal et al. 2016).

New tools to facilitate data integration

Another important evolution in the data landscape has been the development of tools to facilitate the integration of data across people and pixels. These new tools include both custom-built platforms designed specifically for integrating these types of data and general purpose analytical software and programming languages that are becoming both more powerful and easier to use. These tools reduce the effort required to assemble and process large swaths of remotely sensed data, and facilitate derivation of information from remotely sensed data in formats and data structures more familiar to social scientists. In this section, we briefly describe three data integration platforms that provide data and functionality that are particularly useful for cross-disciplinary work combining people and pixels: IPUMS Terra, which was developed out of the social science community to facilitate the integration of physical variables derived from remotely sensed data with census-based population data; Giovanni and AppEEARS, both of which were developed primarily for the Earth science research communities but which offer advantages for social scientists; and Google Earth Engine, which provides a range of datasets and a powerful processing platform.

The Appendix provides more details about each of these platforms (sections “IPUMS Terra,” “Giovanni and AppEEARS,” and “Google Earth Engine”), as well as related information that is otherwise beyond the scope of this paper. The “Software for handling spatial data” section of the Appendix briefly surveys the state of spatial analysis software and developments in support for spatial data in more general-purpose data analysis tools. The “Spatial analysis methods” section in the Appendix provides key references addressing statistical and mathematical spatial analysis methods.

IPUMS Terra is a product of the IPUMS Data Center, which focuses primarily on population data from censuses and surveys. Building on IPUMS’ large body of population data, IPUMS Terra is designed to integrate population and environmental data. IPUMS Terra utilizes location-based integration to transform among three data structures: individual-level population microdata, area-level data describing places defined by geographic boundaries, and pixel-based raster data (Kugler et al. 2015; Ruggles et al. 2015). IPUMS Terra currently includes population microdata and associated area-level data from IPUMS International (Minnesota Population Center 2018); basic area-level population data from countries not participating in IPUMS International; and environmental data on climate (Harris et al. 2014; Hijmans et al. 2005), land cover (European Commission, Joint Research Centre 2003; Friedl et al. 2010), and agricultural land use (Monfreda et al. 2008; Ramankutty et al. 2008).

Giovanni and AppEEARS are tools developed by NASA to facilitate remote sensing data discovery, extraction, processing, and visualization (Adamo and de Sherbinin 2018). Giovanni’s focus is largely on remotely sensed atmospheric data of relatively low spatial resolution and very high temporal resolution (including hourly datasets), concentrated primarily in the areas of atmospheric composition, atmospheric dynamics, global precipitation, hydrology, and solar irradiance. Giovanni enables online mapping and visualization, as well as data downloads. AppEEARS’ focus is largely on remotely sensed terrestrial data of moderate spatial and temporal resolution (30 m to 1 km; daily time steps or monthly composites). Datasets available in AppEEARS include Terra & Aqua MODIS, the Shuttle Radar Topography Mission (SRTM v3), Web Enabled Landsat Data (WELD), Gridded population of the World (GPW), and NASA data products derived from the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument. AppEEARS allows users to subset large geospatial datasets based on user-defined polygons or points.

In addition to tools developed by the research community, Google is leveraging their extensive hardware platform and computing resources to manage and process increasingly large volumes of remotely sensed data. Google Earth Engine (GEE) serves as a free platform (free for noncommercial use) where users can easily visualize and download a relatively wide range of spatial data (Gorelick et al. 2017). GEE hosts satellite and other geospatial data gathered from a range of sources including Landsat, Sentinel, some MODIS data, Climate Hazards Infrared Precipitation with Stations (CHIRPS), and many other datasets. GEE also supports simple geospatial analysis through its own Explorer product and more complex analyses through an application programming interface (API).

People and pixels in practice

People and Pixels outlined several ways in which remote sensing had the potential to support social science research (Rindfuss and Stern 1998). Among the modes they discussed, we have described advances in how remotely sensed data are being used to provide additional measures for social science by extending the spatial and temporal detail of population data and measuring features and phenomena such as poverty, forest cover change, and climate variables. We now describe examples of social science research that leverage remote sensing’s capabilities to measure social phenomena, their contexts, and their effects, particularly incorporating time-series data. These examples demonstrate how researchers use remotely sensed data to refine understanding of how people interact with the environment and to broaden perspectives from individual locations and events to broader scale patterns.

Measuring the environmental context of social phenomena

Many social science disciplines have foundational theories that consider how individuals or groups respond to changes in context over space and time. The increasing availability and accessibility of remotely sensed data has enabled researchers to more readily incorporate measures of agricultural production, surface water conditions, urbanization, rates of deforestation, and other factors to test these theories. Remotely sensed data have proven particularly valuable in the poorest countries, where land use and agricultural censuses and climate monitoring systems are otherwise lacking. The ability to measure changes in context over time with remotely sensed data allows researchers to isolate and investigate the actual social phenomena of interest.

Researchers have taken advantage of detailed contextual data from remote sensing in combination with individual-level health data to investigate interactions between food security, climate extremes, and health outcomes. In investigations related to food security, remotely sensed vegetation metrics such as the normalized difference vegetation index (NDVI) or the water requirement satisfaction index (WRSI) can be used to estimate cultivated area, vegetation, or drought at fine spatial and temporal scales. In some settings (namely rainfed and subsistence farming communities), these indices may be used to estimate agricultural production, livestock fodder, or food availability in a given community (Bakhtsiyarava et al. 2018; Brown et al. 2014). As such, remotely sensed data complement sources of health data on individuals, such as the Demographic and Health Surveys or the World Bank’s Living Standards Measurement Study, by providing estimates for agricultural extent and growing season conditions in the area surrounding the community where the individual lives (Johnson and Brown 2014; Shively et al. 2015). Researchers can then examine how individual health outcomes vary in relationship to variability in the growing season or agricultural production across time and space. Similar approaches can be used to link climate extremes such as heat waves and drought to human health outcomes (Davenport et al. 2017; Grace et al. 2012; Isen et al. 2017), or climatic conditions to vector-borne disease outbreaks (Chretien et al. 2007). These approaches can be extended to examine multilevel interactions, incorporating individual, community, and environmental characteristics. Disciplines within public health, including epidemiology and environmental health sciences, have also explored linkages between health and context using remotely sensed data. Remotely sensed data have been used in investigations of infectious disease outbreaks, cancer occurrence, and linkages between air pollution and health (Buczak et al. 2012; Dey et al. 2012; Maxwell et al. 2010; Stefani et al. 2013).

Remotely sensed data have also been applied to better understand the environmental contexts relevant to historical populations. For example, archeologists have used high-resolution synthetic aperture radar (SAR) data to map ancient irrigation systems that now lie underground (Wiig et al. 2018). Anthropologists have also employed viewshed analysis utilizing a SAR-based digital elevation model to gain insight about settlement and activity locations on San Clemente Island (Comer et al. 2013). These analyses revealed settlements situated within view of both each other and expanses of ocean frequented by marine mammals, facilitating hunting activities, and ritual sites within view of sites on other islands, reinforcing cultural and social relationships among islands. Remotely sensed data has also helped explain the paradox of Petra, a great ancient city built by the Nabateans. For centuries prior to the construction of Petra, the Nabateans were nomadic, amassing great wealth by transporting precious goods through the vast deserts of the Arabian Peninsula (Comer 2013). Overlaying Nabatean temples and terraces on NDVI images derived from satellite data reveal that they appeared only in areas where agriculture is possible, suggesting a lifestyle transition from nomadic to agricultural, probably driven by Roman actions disrupting the Nabateans’ control of trade in the region. By exploiting environmental information gleaned from multispectral data, Sever explored the relationship between Mayan settlements and wetlands (Sever and Irwin 2003) and argued convincingly for overexploitation of the environment as the cause of the collapse of the Mayan civilization (Sever 1998).

Measuring social phenomena and their effects

A major thrust of People and Pixels was understanding the human drivers of land cover change, especially in rapidly changing regions like the Amazon. At the time, the available spatial and temporal resolution of remotely sensed data enabled identification of land cover classes and terrain features, and researchers used this information to link changes in land cover to demographic changes. This kind of research still features prominently (de Sherbinin 2016) including studies relating population change to land cover change (Aide et al. 2013), biodiversity loss (Williams 2013), and fire activity (Hantson et al. 2015). Yet, in recent years, the range of application areas has expanded. With increased spatial and temporal resolution, researchers are moving beyond land cover and beginning to infer land use and other human activities through changes detectable in remotely sensed data. Researchers have used remotely sensed data to infer agricultural management practices (Gómez Giménez et al. 2017; Jain et al. 2016) and evaluate the effect of governance policies on forest protection (Blackman et al. 2017).

Researchers are also beginning to use remotely sensed data to investigate phenomena, such as narcotics trafficking and population responses to civil war, that are difficult to detect in traditional census and survey data. High-resolution, high-frequency remotely sensed data enable social scientists to detect the landscape signature of these activities, and thereby use human–environment interactions as a means of developing and testing theories about clandestine and otherwise difficult to measure phenomena. For example, over the past decade, social scientists working in Central American contexts began to notice new patterns of large, rapid, forest clearing in remote regions, including protected and indigenous areas (Grandia 2013; McSweeney et al. 2014, 2017). They hypothesized that these patterns may be linked to narcotics traffickers acquiring land over which to move cocaine and for money laundering activities such as cattle and oil palm production (McSweeney et al. 2017). This hypothesis was tested by using the analysis-ready Hansen forest change data (Hansen et al. 2013) to develop spatial and temporal pattern metrics for patches of forest loss and identifying statistically “unique” groups of deforestation (Sesnie et al. 2017). Anomalous deforestation in several departments of Nicaragua, Honduras, and Guatemala was found to be significantly correlated to cocaine flow data available at the departmental level, accounting for between 15 and 30% of total forest loss in the region between 2000 and 2014.

The time-series data available from repeat imaging by satellite also holds promise for understanding otherwise difficult to detect consequences of civil wars. Groups such as UNOSAT, Human Rights Watch, Amnesty International, and satellite data-savvy journalists have used sequences of very high-resolution imagery to measure urban and infrastructural damage following armed conflict across a range of contexts and countries. While these satellite mapping efforts have been essential at raising awareness and supporting humanitarian relief, their focus on acute damage is only one component of the war economy taxonomy, which includes “combat,” “shadow,” and “coping” economies (Goodhand 2004). The combat economy directly supports military objectives, including territorial capture and control over natural resource exploitation; the shadow economy involves commodity smuggling and high-value resource extraction; and the coping economy encompasses adaptations by the noncombatant population, such as labor migration and changes in agricultural production (Auty 2001; Collier and Hoeffler 2002; Fearon and Laitin 2003; Hegre 2004; Ross 2004). Of the three aspects, the coping economy has been the least studied (Ballentine and Nitzschke 2005), yet is well suited for investigation by remote sensing. Moderate-resolution imagery with subweekly repeat coverage and high-resolution weekly imagery now being made available through the Copernicus program support examination across seasons, years, and war periods (Witmer 2015). Land cover changes including urban construction, deterioration, and damage and fluctuations in agricultural production can therefore be detected across broad spatial extents (Eklund et al. 2017; Klaasse et al. 2018). These changes may, in turn, signal population relocation or changes in agricultural practices that reflect the coping economy.

Conclusion

When People and Pixels was published 20 years ago, it showcased the work of a relatively small group of social scientists beginning to use newly available remote sensing data. These researchers glimpsed the possibilities that connecting people to pixels offered for understanding patterns and processes of human–environment interactions and invested significant effort in making connections across scientific domains and data structures. Key institutions also recognized the possibilities and supported these efforts with funding, including NASA’s Land Use Land Cover Change Program and NIH/NIMH funding for social and population scientists to use GIS and remotely sensed data in their work (Entwisle and Stern 2005).

Over the past two decades, technological and cultural transformations have continued. Remote sensing data continues to be collected at ever-finer spatial and temporal resolutions. Processing methods to extract useful information from these high-volume data continue to develop, and new software and tools make the data more accessible. These technical advances drive, and are driven by, increasing spatial awareness throughout the social sciences and allied fields. Economists, public health researchers, anthropologists, demographers, and political scientists are all increasingly interested in not just the locations and times of specific events but the spatiotemporal patterns and processes that underlie series of related events and interactions.

The technical advances and the increased interest in spatial approaches throughout the social sciences are undoubtedly improving the connection between people and pixels. However, important challenges remain. First, we must not lose sight of ethical issues surrounding the collection and use of spatial data in social science applications. Interest in linking traditional social science data with remotely sensed data has driven increased demand for more precise spatial locations in census and survey data. However, such spatially explicit data raise concerns and introduce tensions surrounding confidentiality, data preservation, and data sharing for reproducibility (VanWey et al. 2005). Current approaches, such as that employed by the Demographic Health Surveys, address these concerns by introducing uncertainty in spatial locations, which then necessitates accounting for the uncertainty in analyses.

Second, uncertainty deliberately introduced to protect confidentiality is by no means the only source of uncertainty when making connections between people and pixels. Modeled and derived data products, such as those discussed in the “Derived/modeled data products” section, have been a boon to facilitating work across disciplinary boundaries. However, because the final data products effectively mask the complexities of the processes used to create them, users of such data may be tempted to treat them as “truth.” Researchers must maintain awareness of the multifaceted uncertainties in any data they use and their potential effect on analyses. Developing readily adopted means of accounting for and communicating uncertainty, therefore, continues to be a fruitful direction for research.

Finally, another major challenge lies in the balance of research effort between technical and substantive approaches. While technological advances will undoubtedly continue, the questions posed by social scientists will not be answered by higher resolution data or more processing power alone. We must ensure that the social science research efforts receive adequate attention and support and are not overwhelmed by the lure of new technology for its own sake. The trend toward greater interdisciplinarity to address issues and concerns that cut across social and environmental science is beneficial in this regard. Such complex questions are of great concern to policy makers and citizens, and addressing them requires both theoretical frameworks encompassing social and environmental domains as well as data on both people and pixels to provide the empirical underpinning. In addition to the social and environmental sciences, a key aspect of people and pixels is that remote sensing sciences must be closely involved in the conversation. Continued dialog will enable remote sensing scientists to understand what advances would be most beneficial to better understand landscape variability relevant to socio-environmental challenges and enable social and environmental scientists to better understand the limitations and advantages of the array of remotely sensed data products.