Introduction

The Demographic and Health Surveys (DHS) are the most important source of comparative information on the health of non-elderly adults and young children in low- and middle-income countries (LMICs) (Corsi, Neuman, Finlay, & Subramanian, International Journal of Epidemiology, 41, 1602–1613, Corsi et al. 2012). DHS data (https://dhsprogram.com/publications/Journal-Articles-Search.cfm) are the basis for over 6000 journal articles since 1986. Since 1984, with funding from the US Agency for International Development (USAID), the DHS has asked women of childbearing age in LMICs about their personal backgrounds, health, birth histories, and the health of their young children. The surveys also collect data on household characteristics and household members. In the 1990s, they began collecting information from men. A call for greater attention to context (Butz and Torrey 2006; Matthews 2008) prompted The DHS Program to provide GPS location information for sampling clusters, somewhat displaced to maintain confidentiality, for most of their surveys.

In 2012, a team at the University of Minnesota, with funding from NICHD, created IPUMS DHS (Boyle et al. 2019) to facilitate comparative over-time and cross-national research with the DHS. IPUMS DHS harmonizes variable names and codes across surveys, makes it unnecessary to merge files, and allows researchers to download a single integrated data file, customized to include only selected variables. In IPUMS DHS, each variable name links to extensive documentation, such as the related survey question and questionnaire skip pattern. IPUMS DHS offers free access to thousands of harmonized DHS variables across many samples.

Exploiting georeferenced information in the DHS’s public use GPS datasets, IPUMS DHS now provides contextual variables linked to individual records based on sample cluster locations. These variables cover physical environment, economic and population features, and agriculture. With this latest capability, researchers without geographic training can easily use contextual variables with DHS data. Contextual variables constructed around DHS’s cluster location points reflect subjects’ relatively close physical environment and better accommodate heterogeneity in environmental variables than some alternative sources, such as IPUMS Terra, which supplies combined land use, climate, and population data for larger geographic areas.

IPUMS DHS currently includes the full range of DHS variables for 38 countries and 156 samples, with most standard DHS samples from Africa. IPUMS DHS has also created contextual variables for all DHS files with GPS location data, shown in Table 1. The IPUMS DHS website offers the contextual variables in downloadable .csv files that can be merged with DHS public use files. Samples bolded in Table 1 are fully integrated into IPUMS DHS. For these bolded samples, researchers can create a customized data file of harmonized variables, including any relevant contextual variables, without having to merge files. We describe both processes below.

Table 1 DHS Samples with GPS Data, as of August 2019

IPUMS DHS incorporates new material every year, adding DHS surveys from more countries and the latest surveys from countries already in the database. (Our funding prioritizes harmonizing standard DHS samples from Africa, the Middle East, and South Asia.) For any newly released DHS GPS files, we create new .csv files of contextual variables. Ultimately, we expect to harmonize the full range of DHS data, including AIS (AIDS monitoring) and MIS (malaria monitoring) surveys. We also plan to add other contextual variables, and we invite suggestions for including other high-quality measures available for a broad range of LMICS.

IPUMS DHS contextual variables unlock exciting research opportunities on population and environment, but they are not the best tool for every project. The IPUMS DHS team made choices, described below, about how to deal with the displacement of the public use DHS cluster location information; some scholars might prefer a different solution. Additionally, we chose which environmental variables to include and, when relevant, how to aggregate them. Researchers who prefer other variables, or who need the variables aggregated differently, may prefer to do their own linking across the data sources. Below, we describe our choices so analysts can make an informed assessment of whether IPUMS DHS is the best fit for their research.

In the following sections, we first describe our methods for dealing with data displacement and constructing IPUMS DHS contextual variables. We next offer information about each contextual variable, with examples of research joining contextual information with DHS data and other sources. We then give guidance on using the IPUMS DHS contextual variables. After providing an example using some of the contextual variables, we describe the limitations of IPUMS DHS and our plans for adding more contextual variables to IPUMS DHS.

Methodology

Displacement

The DHS Program displaces public use GPS data points to preserve confidentiality (Burgert et al. 2013). Georeferenced DHS data aggregate cases from each sampling cluster to a single point coordinate and then mask the coordinate via displacement. Displacement is 0 to 2 km for urban areas and 0 to 5 km for rural areas, with uniformly distributed displaced distances. One percent of rural clusters are displaced up to 10 km. The DHS Program recommends averaging supplemental environmental data over a 5- to 10-km buffer around rural cluster points, though the scale of spatial processes and similarity of nearby cells may influence buffer zone size (Perez-Heydrich et al. 2016).

To deal with displacement, we constructed 5- or 10-km buffer zones around GPS sample points to create IPUMS DHS contextual variables. The creation of most IPUMS DHS contextual variables is based on a common general methodology using Esri’s ArcGIS software suite.Footnote 1 We constructed 5-km buffer zones around sample points when working with environmental data with clear map boundaries (i.e., population density, soil type, livelihood zone, and ecoregion), with the predominant value in the buffer assigned to each sampled person within the sample cluster. For soil type, livelihood zone, and ecoregion, categorical values are homogeneous within boundaries (e.g., a single livelihood type is assigned to a clearly mapped area within a country). This homogeneity reduces the likelihood of assigning an erroneous value due to missing the sample point for the few rural clusters displaced more than 5 km.

We constructed 10-km buffer zones around sample points for the other contextual variables because the environmental source data had less clearly demarcated boundaries (e.g., pixelated raster data from satellite imageries) and had a range of measured values within those rough boundaries (e.g., for temperature, precipitation, and malarial incidence). For these contextual variables, values—the mean, maximum, sum, or average—were computed for all pixels in a 10-km buffer, and these estimates were assigned to all individuals in each cluster location. A different methodology used for the conflict variables (battles, riots, and violence against civilians) is described below.

Using 5- or 10-km buffer zones around reported GPS sample points matches both the general guidelines for computing environmental variables put forward by The DHS Program and the practice of several publications pairing DHS data with supplemental environmental data.Footnote 2 There are, however, trade-offs in opting for smaller versus larger buffer zones. A larger buffer zone (e.g., 10-km radius) may conceal underlying heterogeneity within small areas; a smaller buffer zone (e.g., 5-km radius) may exclude the actual cluster location for the 1% of rural clusters displaced up to 10 km. Some researchers studying rare events, such as political conflicts, have used larger buffers to include enough cases for analysis.Footnote 3 Nor is the approach of constructing a buffer zone of at least 5 km around a reported cluster location the only possible strategy. Grace and co-authors recently proposed “that the user selects a settlement near the DHS’ published location and measures the environmental conditions using a buffer much smaller than 10 km ... [hypothesizing that] a small, precise buffer around an incorrect settlement is a better measure of truth than is an overly large buffer around the published point” (Grace et al. 2019). Scholars must determine if the IPUMS DHS approach is suitable for their research.

Selection of contextual variable datasets

When selecting which environmental datasets (described below) to pair with the DHS, we generally chose those that were most up-to-date and had the highest spatial and temporal resolution. Among similar variables, we also chose those that were measured directly over those that were imputed.

Variables of physical and environmental setting

IPUMS DHS currently includes contextual variables on the physical environment, economic and population features, and agriculture. Table 2 provides summary information about these contextual variables, including each variable’s meaning, name, original source, temporal resolution, spatial resolution in kilometers in the original data, and variable type (categorical, bounded numeric [from 0 to 1], or unbounded numeric). Every year, IPUMS DHS adds contextual variables for any newly released DHS public use data files with GPS data.Footnote 4

Table 2 Summary of contextual variables in IPUMS DHS

Two environmental variables, ecoregion and soil,Footnote 5 have categorical values from single points in time and homogeneous values within clearly defined boundaries. Ecoregion is the predominant terrestrial ecoregion (circa 2001) within a 5-km circular buffer around DHS cluster locations. This variable uses a 5-digit coding system consisting of the 1-digit real (ecozone) code, the 2-digit code for biome (habitat type), and the 2-digit specific ecoregion number. There are over 800 terrestrial ecoregions, and users can conduct analyses at any of the three nested levels. Obtained from the Terrestrial Ecoregions of the World database by the World Wildlife Foundation (Olson et al. 2001), ecoregion data are available for all countries. With a few exceptions (e.g., León, 2012), researchers employing DHS data have made little use of data on ecoregions, leaving open many research opportunities.

The variable soil specifies the predominant soil type and class for a 5-km circular buffer around DHS clusters. Data covering 118 soil types and 20 top-level categories come from the World Soil Information database (Hengl et al. 2017). Soil data are available for all countries, based on information published circa 2016. Studies have used soil type to explain variation in hookworm parasite prevalence (Mabaso 1998; Mabaso et al. 2003), estimate population-level micronutrient intake and deficiency risks (Joy et al. 2015), link agroecosystems to child stunting (Alemu et al. 2017), explain district differences in Indian women’s agricultural work and child sex ratios (Carranza 2014), and, together with DHS data, link agricultural capacity to child hunger (Balk et al. 2005).

Three sets of additional environmental variables (greenness, precipitation, and temperature) vary over time and are presented for a total of 72 months around a given DHS survey. By offering 72 months of data, IPUMS DHS has included information most likely relevant to researchers. DHS surveys generally collect information on the health and survival of children born in the 60 months before the interview date,Footnote 6 and some surveys are fielded over several months. The 72 months include measurements for each of 60 months prior to the survey start date, the month of the survey start, and each of 11 months following the survey start date.

NDVI (Normalized Difference Vegetation Index) is a remotely sensed indicator of live green vegetation, with values potentially ranging from negative one to positive one. Because IPUMS DHS provides the maximum observed value across all the pixels within a 10-km circular buffer around each DHS cluster, the NDVI values in IPUMS DHS are never less than zero. Barren areas of snow, sand, or rock have the lowest NDVI values; shrub and grassland have moderate NDVI values; and temperate and tropical rainforests have the highest NDVI values. These data can identify areas under stress (such as extended drought) or undergoing change (such as deforestation) (Weir and Herring 2013). Obtained from the Moderate Resolution Imaging Spectroradiometer (MODIS) (Didan 2015) and covering all countries, the original data are based on satellite images collected every 16 days, from February 2000 to 2018. The contextual variables on NDVI in IPUMS DHS include 72 variables covering each of 60 months prior to the survey start, the first survey month, and each of the following 11 months.

Multiple publications have used NVDI data with DHS data. Studies have focused on interactions between water and food security for child health (Grace et al. 2017), relationships between agriculture and fertility (Grace and Nagle 2015), mosquito habitat suitability and mosquito net distribution (Acheson 2015), dietary quality and tree cover (Ickowitz et al. 2014), climatic conditions during gestation and infancy and sex-specific vulnerability to undernutrition (Mulmi et al. 2016), effects of environmental change on health (Brown et al. 2014; Johnson and Brown 2014), and the association between climate variability, birthweights, and agricultural livelihoods (Bakhtsiyarava et al. 2018).

Precip (monthly precipitation in milliliters) is reported for a 72-month time series for a 10-km circular buffer around each DHS cluster. Precipitation data were originally obtained from Climate Hazards Group, which provides infraRed Precipitation with Station data (CHIRPS) (Funk et al. 2015). Spanning all longitudes, CHIRPS data constitute a 30 + year quasi-global rainfall dataset that incorporates 0.05° resolution satellite imagery with in situ station data, to create gridded rainfall time series. The precip variables are available for all DHS samples with publicly released cluster data (as shown in Table 1). Researchers pairing precipitation and DHS data have studied the effect of climate change on child health (Davenport et al. 2017; Grace et al. 2015; MacVicar 2016) and the effect of early-life drought exposure on later child stunting (Lillepold et al. 2018) and wealth in adulthood (Hyland and Russ 2019). IPUMS DHS currently includes raw precipitation data for individual months; we plan to add summary monthly precipitation data averaged across decades for comparison purposes in the near future.

Temp_min and temp_max include 72 months of temperature data in the 10-km radius of sample clusters. Values are available for all DHS samples with GPS cluster data, based on data for January 1980 to September 2018 from the Terrestrial Hydrology Research Group at Princeton University (Sheffield et al. (2006). Studies focusing on the interplay between climate and health often use temperature data along with precipitation data. Research has found prolonged exposure to high temperatures to be an independent health risk (Grace et al. 2015). IPUMS DHS plans to supplement the existing temperature data for individual months by adding, in the near future, summary monthly temperature data averaged across decades for comparison purposes.

Variables of economic and social contexts

Livelihood reports the predominant economic basis for subsistence within a 5-km circular buffer around DHS clusters. This variable, which has a homogeneous value assigned to a clearly defined subregion within a country, indicates sources of food and/or income and people’s vulnerability to hazards such as floods and droughts. Values consist of a 3-digit country code followed by a 3-digit numeric livelihood (e.g., river fishing and staple crops) code. The non-time-variant source data (published in 2012) were obtained from the Famine Early Warning System Network and provide information on 36 countries with DHS samples with cluster location data (Livelihoods 2020). The livelihood variable is country specific.

Data on livelihood zones combined with DHS data have yielded insights into population vulnerability and human health in Africa. Jankowska et al. (2012) examined current links between livelihood and measures of malnutrition and projected future outcomes (malnutrition and anemia cases), given climate change. Grace and her co-authors incorporated livelihood zone data into multilevel models of staple food prices and birth weights (Grace et al. 2014) and of climate variables and child stunting (Grace et al. 2012) and examined how weather effects on birth weight vary across livelihood zones (Grace et al. 2015). Policy-relevant DHS-based studies have considered how livelihood zones should be factored into efforts to reduce child stunting (Woodruff et al. 2017) and into development efforts (Malcomb et al. 2014).

Popdensity reports persons per square kilometer within a 5-km circular buffer around DHS clusters for the years 2000, 2005, 2010, 2015, and 2020. The data were obtained from the Gridded Population of the World (GPW-v4), provided by the Socioeconomic Data and Applications Center (Center for International Earth Science Information Network (CIESIN) Columbia University, 2016) and are based on counts consistent with national censuses and population registers. A proportional allocation gridding algorithm is used to assign population values to 1-km grid cells, and the population density grids are created by dividing the population count grids by the land grids.Footnote 7 The original DHS public use files include a widely used dichotomous urban/rural variable, with the definition of “urban” set by each country’s national statistical office.Footnote 8 The popdensity variable provides a finer level of resolution that allows researchers to incorporate their own definitions of urban and rural depending on research needs. We use this variable in a data analysis example below.

Malaria consists of 16 separate variables, one value for each year from 2000 to 2015. Malaria reports the mean clinical Plasmodium falciparum malaria incidence within a 10-km circular buffer around DHS clusters. The malaria incidence is expressed as fractions of cases per person; values range from zero to one. Data were obtained from the Malaria Atlas Project (MAP) (Bhatt et al. 2015) and were expressed as malaria incidence raster layers, with circa 5-km spatial resolution in the World Geodetic System-1984 Geographic Datum coordinate system (National Geospatial-Intelligence Agency 2012). The source data cover 48 countries in sub-Saharan Africa, 30 of which have DHS samples with GPS cluster data.

Published studies have used mapped data on malaria incidence in conjunction with DHS data. Amoah et al. (2018) coupled data on child undernutrition from 20 DHS surveys with MAP data on malaria incidence. Cuadros and co-authors (Cuadros et al. 2011a, 2011b) studied the relationship between malaria and HIV prevalence in East and West Africa, using DHS and MAP data. Kazembe et al. (2007) conducted a spatial analysis of the relationship between early childhood mortality in the DHS and malaria endemicity. More commonly, researchers use data from The DHS Program’s Malaria Indicator Surveys (MIS), which include blood draws for testing for malaria parasites. MAP data on malaria incidence cover additional years and some countries without MIS surveys, however.

Conflict variables (civ_violence, riots, and battles) draw data from the Armed Conflict Location and Event Data Project (ACLED) (Raleigh et al. 2010).Footnote 9 Conflict variables represent an annual count of days with armed or civilian conflicts, according to three conflict types (violence against civilians, riots, and battles), occurring within a 10-km circular buffer around a DHS cluster location. The longest time series cover 1997 to 2016, with longer spans for Africa than for Asia and the Middle East. IPUMS DHS staff converted longitude and latitude coordinates to a point layer, created a 10-km buffer around each DHS sample cluster location, and counted conflict events falling within the buffer in a year. If a buffer crossed an international boundary, only events occurring in the same country as the DHS sample were included. Conflict variables are available for 42 currently available DHS samples.

Studies combining DHS and ACLED data have examined the effect of armed conflict on child welfare in Cote d’Ivoire (Wayoro 2018), the relationship between access to healthcare and civil conflict within African states (Rohde 2014), negative effects of exposure to violent and non-violent civil insecurity on child growth in rural Africa (Darrouzet-Nardi 2016), and the relationship between past exposure to violent conflict and risky sexual behavior in Uganda (Delavande and Menezes Cordeiro 2012). We provide a further example of this fruitful combination of data from DHS and ACLED from our own research later in this paper.

Agricultural context variables

Cropland and pastureland variables provide data as bounded numeric values (from 0 to 1) representing the proportion of a 10-km circular buffer around each DHS cluster that is covered by cropland and pastureland, respectively. The original cropland and pasture area raster layers were acquired from EarthStat (Monfreda et al. 2008; Ramankutty et al. 2008) and cover all countries, based on material from circa 2000 from agricultural inventories and satellite-derived land cover data.

Crop harvest area and crop production variables report area (hectares) dedicated to growing and yield (metric tons) for specific crops, within a 10-km circular buffer around DHS clusters. Data were obtained from EarthStat, which produces data for 175 crops (Monfreda et al. 2008). IPUMS DHS uses a subset of 17 major crops (i.e., barley, cassava, cotton, groundnuts, maize, millet, oil palm, potatoes, rapeseed, rice, rye, sorghum, soybeans, sugar beets, sugarcane, sunflowers, and wheat). EarthStat describes these data as “created by combining national, state, and county level census statistics with a recently updated global data set of croplands” on a 10 × 10 kilometer latitude/longitude grid, covering all countries circa 2000.

Publications have linked anthropometric measures of undernutrition with indicators of agricultural productivity (Slavchevska 2015; Headey and Hoddinott 2016; Liu et al. 2008; Vepa et al. 2016), examined the relationship between diversity in crops grown and diversity in children’s diets (Hirvonen and Hoddinott 2017; Jones et al. 2014; Rosenberg et al. 2018), and related children’s school attendance and work to the seasonal demands of staple crops (Mohammed 2016). The agricultural data sources in these works are generally idiosyncratic and country specific. The contextual agricultural variables in IPUMS DHS are standard across multiple countries, with fine geographic detail, and are easily combined with DHS data on undernutrition, dietary intake, child labor, and school attendance.

Accessing and using the IPUMS DHS contextual variables

IPUMS DHS data are free to all registered DHS users. Even without registering, anyone can browse the samples, variables, and extensive documentation at the IPUMS DHS website. For DHS samples that are fully integrated and included in the IPUMS DHS database (see bolded entries in Table 1), users select contextual variables for inclusion in their customized data files. Researchers unfamiliar with the IPUMS data dissemination system may want to review the IPUMS DHS “User Guide” or YouTube videos, which explain the intuitive IPUMS interface (click “HELP” at the top of the page at dhs.ipums.org).

Adding a variable to one’s data file (referred to as the “Data Cart”) adds a single variable to each record for contextual variables with values measured in 1 year and invariant across DHS sample years, such as soil. For contextual variables measured at multiple time points, such as battles or popdensity, adding the variable to one’s data file adds the entire time series of variables to each record. For example, a researcher selecting battles will receive a series of 20 variables (battles_1997, battles_1998, and so forth up to battles_2016). She must construct a variable for the values useful to her analysis (e.g., battles in the year prior to a survey).

To use contextual variables for DHS samples not currently integrated into IPUMS DHS, researchers follow a different process. They can download the .csv files of the contextual variables at dhs.ipums.org (click on “CONTEXTUAL VARIABLES” in the left sidebar), for all of the 168 DHS samples with GPS location data. Researchers then link .csv file data to original DHS files downloaded from The DHS Program website using the dhsid variable. Dhsid is a concatenation of the 2-character DHS country code (e.g., BD for Bangladesh), the 4-digit year variable (e.g., 2014), and the 8-digit cluster number (e.g., 00000008) into a 14-character string (e.g., BD201400000008); the content matches the dhsid variable in the original DHS files supplying GPS cluster locations.

Case study: using IPUMS DHS contextual variables

As a contribution to the literature and an illustrative example, we consider whether there is an association among exposure to battles, population density, and intimate partner violence with a weapon (IPVW) for ever-partnered women in 13 sub-Saharan African countries.Footnote 10 Studies have linked exposure to political conflicts to rates of intrafamilial violence and acceptance of such family violence (Kelly et al. 2018; La Mattina and Shemyakina 2018; Østby 2016). To date, however, such studies have not considered whether relatively low levels of exposure to battles, outside the context of all-out war, have similar influences.

For population density, prior research suggests varying predictions. Greater population density is associated with higher levels of violent crime, including aggravated assault and rape (Bouffard and Muftic 2006). The theoretical explanation is that in low-population-density locations, individuals are more likely to know one another and exercise informal social control to prevent violent crime. In contrast, studies of most forms of IPV show no relationship or a negative relationship with population density. A review of US research discerned no notable differences across urban, suburban, and rural areas in the prevalence of most forms of IPV (Edwards 2015), while Balogum et al. (2012) found lower levels of physical IPV among urban versus rural women in Nigeria. Meanwhile, intimate partner homicide rates—which are likely related to IPVW—are lower in urban areas (Edwards 2015; Gallup-Black 2005). Overall, the IPV literature paints a different picture than the violent crime literature, suggesting an either a negative or no association between population density and IPVW.

A final consideration is whether the two contextual variables, popdensity and battles, will interact, with the impact of battles differing depending on the level of population density. Dense urban areas are more likely characterized by high levels of social disorganization (Kunnuji 2016). In such environments, the disruptive impact of battles could be exacerbated, resulting in a positive interaction effect. Contrariwise, because of the typically stronger social ties in rural areas (Bouffard and Muftic 2006), the social disruption of battles may be more intense in low-density areas. If the latter is the case, we will see a negative interaction.

Both IPVW (Fig. 1) and exposure to battles (Fig. 2) are rare occurrences, warranting a pooled sample to conduct the analysis. Using IPUMS DHS, we downloaded a single dataset that included all respondents for each of the 13 samples and the variables relevant to our analysis, harmonized across samples.

Fig. 1
figure 1

Percent experiencing intimate partner violence with a weapon in last year, by sample

Fig. 2
figure 2

Percent exposed to at least 1 day of violence in year prior to survey, by sample

DHS surveys for the 13 samples asked women whether their partners had threatened them with a knife, gun, or other type of weapon in the last 12 months (our dependent variable). The key independent variables are whether a woman had been exposed to at least 1 day of battle in the year prior to the survey and population density. We used the battles variable, which includes battles within 10 km of each woman’s cluster location, to construct exposure to battles in the year preceding the survey. We interpolated values of population density to match survey years using popdensity_2010 and popdensity_2015 and logged the resulting values.

We used logistic regression (applying the domestic violence sample weight), including a number of individual- and household-level control variables and country fixed effects. Table 3 reports the results. Model 1 shows the results with the main effects for battles and population density, and Model 2 shows the results with an interaction between battles and population density. Model 1 reveals that battles are associated with an increased probability of IPVW (OR = 1.400; 95% CI = 1.025, 1.905; p < 0.05). This is consistent with prior research considering the effect of war on IPV (e.g., Kelly et al. 2018). The model also uncovers a positive association between population density and the risk of IPVW (OR = 1.072; 95% CI = 1.009, 1.140; p < 0.05), suggesting that theories of violent crime may apply to IPVW. Model 2 indicates a modest but statistically significant negative interaction between battles and population density (OR = 0.8332; 95% CI = 0.702, 0.989; p < 0.05). Among women unexposed to battles, there is little variation in the probability of IPVW across low- and high-population-density areas. Holding their risk of IPVW constant, Fig. 3 shows the marginal probabilities for women exposed to battles, across levels of population density. Among women exposed to battles, IPVW likelihood is significantly greater when population density is lower. This finding suggests that battles are especially disruptive of the dense social ties that often characterize less populated regions.

Table 3 Logistic regression predicting experience of intimate partner violence with a weapon in the last year, N = 210,768 (weighted)
Fig. 3
figure 3

Marginal effect of population density on women exposed to battles relative to women unexposed to battles

This preliminary analysis, illuminating previously unstudied relationships between population behavior and the environment, illustrates the ease with which researchers can combine traditional DHS variables with contextual data using IPUMS DHS.

Limitations and future plans

Over time, IPUMS DHS will add more contextual variables. One limitation of the data, however, is that researchers may not find a particular contextual variable of interest from a preferred source. Likewise, while the IPUMS DHS team has utilized some of the most popular aggregation techniques for spatial and temporal data, there is an infinite range of possibilities. Researchers who prefer other techniques may prefer to link datasets through other means.

A second cautionary note is that researchers should evaluate whether dates when contextual data were collected are reasonable matches for their sample(s) of interest. For example, the non-time-variant agricultural variables (types of crops) derive from data collected in the year 2000. IPUMS DHS leaves it to researchers to investigate the appropriateness of using such data with DHS data collected before 2000 or significantly later. IPUMS DHS gives researchers the flexibility to make these decisions themselves.

Looking ahead, IPUMS DHS has specific plans to increase the suite of contextual variables. Already mentioned is adding summary variables on monthly precipitation and temperature values averaged across decades as baselines for comparison. We will also be adding linking keys that join women through DHS cluster locations to the most temporally proximate international census data from IPUMS-International. This innovation will allow researchers to aggregate census indictors across small-area geography and observe their effects on non-elderly adults’ and children’s health. Additional plans include incorporating into IPUMS DHS the geospatial covariates, such as night lights, travel times to various locations, and frost occurrences, from the contextual variables calculated by The DHS Program’s staff and offered through The DHS Program website (at http://spatialdata.dhsprogram.com/covariates). Researchers are encouraged to contact the authors with additional suggestions of important contextual variables to include in IPUMS DHS.

Conclusion

Social scientists recognize that people’s actions and well-being are shaped not only by their individual and familial characteristics but also by community-level characteristics—including transformations such as climate change and geopolitical conflict. While diverse datasets supply georeferenced information on the physical and social environment, locating such information and linking it to other data sources, such as DHS surveys, has been a challenge for geographers and a daunting barrier for researchers from other disciplines. IPUMS DHS makes it easy for researchers to study the effects of the physical environment, population features, and agriculture on health and well-being in LMICS, through a suite of contextual variables. These contextual variables use GPS location data on DHS sample clusters and georeferenced source data from outside the DHS, so researchers can easily study how context shapes health and human behavior. While we have cited research examples, many opportunities remain for understanding health and well-being in LMICS by pairing DHS and contextual data, using IPUMS DHS.