1 Introduction

Overabstraction of groundwater is an acute and growing challenge for many areas of the world (World Water Assessment Programme 2009). Groundwater plays a crucial role in global ecosystem function, and in arid regions is heavily relied on for agriculture and economic development. Overuse, however, now threatens many populous regions, especially in the Middle East (Vrba and van der Gun 2004). Depletion of groundwater in excess of the long-term average rate of replenishment can produce a number of adverse environmental consequences, including increased infiltration of polluted water, salinization, land subsidence, and a continually falling water table (Foster and Chilton 2003).

At the same time, management of groundwater basins poses particular challenges. Groundwater is a non-excludable but subtractable common-pool resource, making the effective limitation of groundwater abstraction at or below the natural rate of recharge a complex collective action problem (Feitelson 2006). A fundamental reason for this complexity is the difficulty of assessing and monitoring groundwater resources. The “fugitive nature of water,” as an ephemeral and, in the case of groundwater, invisible resource creates substantial information asymmetries both between groundwater users themselves and between users and managers (Livingston 1995; Moench 2004).

These information asymmetries tend to prevent effective management because sustainable exploitation of common-pool resources often depends on resource users possessing relatively full and accurate information about a range of factors, including the physical structure of the resource and rates of exploitation by other users (Ostrom 1992; Blomquist 1994). These types of information are rarely available for groundwater resources in developing countries, which often lack the capacity to undertake the expensive and difficult task of monitoring groundwater resources (Vilholth 2006). Consequently, the high transaction costs of eliminating these information asymettries through monitoring of abstraction typically lead to ineffective implementation of groundwater management policies in developing regions of the world (Theesfeld 2010).

Remote sensing may be a solution to these information problems, and this paper accordingly investigates its applicability to the specific challenges of groundwater management using Yemen as a case study. Remote sensing transformed the assessment of water resources in many areas of the world with sparse observational ground records (Schmugge et al. 2002; Hoffman 2005; Jha, et al. 2007). In particular, the launch of the Gravity Recovery And Climate Experiment (GRACE) has for the first time enabled space-based detection of changes in Total Water Storage (TWS) both above and below the land surface at large scales (Rodell et al. 2007). GRACE consists of two satellites flying in formation. As one passes over a dense land mass, the height of the first satellite over the Earth’s surfce changes relative to the second due to increased gravitational attraction. This distance is continually monitored with high precision, yielding solutions for the Earth’s gravity field over a given region (Swenson et al. 2006). From these solutions, the TWS signal can be formulated, which itself comprises deep groundwater storage change (GWSC), shallow soil moisture (SM), surface water storage, ice and snow, and other gravity-altering mass components (Tapley et al. 2004). GRACE has thus been used to assess water management problems in data-poor regions of the world, including to examine the cause of falling water levels in Africa’s Lake Victoria (Awange, et al. 2008).

These individual components can be estimated for given regions by Global Land Data Assimilation System (GLDAS) Land Surface Models (Syed, et al. 2008), which can further be used in combination with GRACE to produce estimates of GWSC (Chen et al. 2005). By isolating GWSC from the total TWS signal, specific measurement of groundwater storage is possible with an accuracy of up to 2–3 mm (Tapley et al. 2004). Consequently, GRACE has been hailed as “the only hope for groundwater depletion assessments in data-poor regions of the world” (Rodell and Famiglietti 2001) such as Yemen. Given its proven ability to monitor aquifer depletion, GRACE is especially relevant to the challenge of groundwater assessment and management (Strassberg et al. 2007; Moiwo, et al. 2009).

However, GRACE also raises a number of questions for groundwater managers. First, given its relatively low spatial resolution, at what scales may GRACE data be useful to groundwater managers? Second, is GRACE data accurate in regions with complex underlying geology, which describes Yemen? Finally, how can GRACE data be used in developing nations, where water management institutions may face severe constraints, and social adaptive capacity to water scarcity may be low? This article seeks to answer these questions by pursuing two specific objectives. These are first to evaluate the accuracy of GRACE-based GWSC estimates in relation to in situ groundwater level measurements; and second, to assesses the degree to which such estimates could be usefully incorporated into groundwater management in Yemen. In particular, this article provides a blueprint for how GRACE-based groundwater storage estimates might be combined with other hydrological and socioeconomic data (in this case measures of food security) to assess vulnerability to groundwater depletion.

This article addresses Yemen as a case study because it faces one of the world’s most acute groundwater depeltion challenges. Indeed, Yemen may be thought of as a “crucial case”: if GRACE can be shown to be applicable to groundwater management in Yemen, it is likely to be useful elsewhere. Thus, while this article focuses on Yemen, it draws a more general lesson applicable to other national groundwater management contexts. While GRACE offers an unprecedented ability to assess GWSC at the regional scale, there remains a great and growing divide between the large scale on which groundwater depletion can be accurately assessed and the local one on which it is managed. Water management reforms should aim to develop the capacity needed to incorporate remotely sensed data into enhanced management strategies.

Yemen exemplifies the challenge, faced by many arid regions, of acute groundwater depletion. Annual groundwater withdrawals exceed replenishment by 36% nationally, and up to 150% in some areas (Sahooly 2003). Groundwater depletion has increased in recent decades as groundwater-fed irrigation, which accounts for 95% of withdrawals, expanded from 37,000 ha in 1970 to 368,000 ha in 1996 (Ward 2000). Consequently, the Sana’a basin water table, which supplies the capital, decreased at an average rate of 4 m/year from 1986 to 1990 (Muthanna and Amin 2005). Groundwater recharge in most areas is very low (Al-Asbahi 2005).

Yemen’s water crisis is the result of several complex factors, including conflict and widespread cultivation of the narcotic qat, which consumes up to 40% of the country’s potable water (Almas and Scholz 2006). Nonetheless, effective water management is severly and particularly hampered by pervasive information asymmetries. This is partially due to the country’s complex hydrogeology. Yemen sits on the boundary of the Arabian Plate, and features varied topography with a mix of basement rock types (Al-Mooji 2010). Aquifer types vary according to this underlying geology, ranging from ancient Paleozoic sandstone in the highlands to young sedimentary formations in the western coastal plain (Al-Sakkaf et al. 1999; Youssef 1991). Much of the remaining groundwater is contained within discontinuous “pockets” often known only to local farmers. A 1989 study, for example, reported “a previously unknown thick aquifer, which contained fresh groundwater at great depths even near the coast” (van Overmeeren 1989). Such uncertainties add to the information asymmetries between groundwater users and farmers, such that not only the amount of abstraction but also the total extent of the resource is likely to be unknown to groundwater managers.

Yemen’s complex hydrogeology is matched by a fragmented and diffuse system of water management institutions, complicating the implementation of nation-wide policies. Traditional surface-water management was central to Yemeni society for centuries (Varisco 2009), and comprised largely sustainable systems of rainwater harvesting and spate irrigation (Rappold 2005). However, these traditional systems have been eclipsed by the shift to groundwater abstraction, which is regulated by formalized laws and regulations declaring that groundwater is the property of the state (President of the Republic of Yemen 2006). Despite this extensive regulation, effective management of groundwater resources remains weak, in large part because of the lack of adequate hydrological data. The cost of well-based monitoring, at some USD$150/well/year, is substantial, especially given the large number of wells required to effectively monitor the country’s many discontinous basins (Noman 2006). The existing monitoring network, despite large infusions of external donor funding, moreover remains inadequate for the task of regulating groundwater abstraction. As a 2003 Yemeni government report admitted, “Efficient water resources planning, development and management is seriously constrained due to non-availability of required data and information from monitoring networks” (NWRA 2003).

GRACE, in this respect, offers a uniform solution to this lack of data, but at the price of complexity and detail which be be difficult to employ where institutional capacity is weak. In Sections 2 and 3 which follow, this paper employs substantial technical detail to illustrate these findings. While necessary to pursue this paper’s first goal of validating the utility of GRACE data in countries such as Yemen, this technical detail stands somewhat at odds with this paper’s second objective of assessing how GRACE data might bear on practical issues in groundwater management. Accordingly, readers more interested in the practical application of GRACE data to groundwater management may wish to proceed directly to Section 4, where the implications of our findings for groundwater management are discussed.

2 Methods

2.1 Study Area

The study area is located in the Republic of Yemen (42°E-53°E longitude, 12°N-18°N latitude), whose climate is linked to the Red Sea and monsoonal Intertropical Convergence Zones, and where rainfall is in most areas under the commonly accepted threshold for aridity of 250 mm/year (Harrower 2009, p. 62). Interannual precipitation is variable, but is concentrated in March-May and July-September (Farquharson et al. 1996). Total renewable groundwater resources are estimated at 1.5 km3/year, and are contained primarily in shallow, low-volume alluvial deposits. The more voluminous deep aquifers are largely “fossil” deposits produced during ancient, wetter climactic conditions, and are effectively non-renewable (UN Food and Agriculture Organization 2010). Most of the western study area is underlayed by unproductive or saline aquifers, with recoverable freshwater supplies mainly limited to confined and discontinuous deposits.

Yemen’s population of 24 million is concentrated in a mountainous western region of relatively high precipitation referred to as the intermontane region. The validation portion of this study calculates GRACE estimates of GWSC for this area, since available well data are also contained within this region. Observed hydrological data, including groundwater levels from wells, were gathered from locations concentrated in the study area. Well data were concentrated in five clusters within the study area (Fig. 1). The approximate area of the study area (42.5°E-45.5°E, 12.5°N-17.5°N) is 190,000 km2, comparable to GRACE’s spatial resolution threshold of ~200,000 km² (Yeh et al. 2006). The underlying geology of the study area is complex, but consists mainly of unproductive combined and saline-intruded aquifers (van der Gun and Ahmed 1995, 69).

Fig. 1
figure 1

Yemen and study area, showing spatial distribution of in situ data collection stations

2.2 Data Collection and Processing

Observed hydrological data, including average groundwater level in approximately 100 wells (~30,000 total records) and rainfall were obtained from the Yemen National Water Resources Authority (NWRA) (NWRA 2009a, b). Non-groundwater components of total water storage (TWS), including monthly SM data, were calculated as outputs from various GLDAS models, which feature a spatial resolution of 1° and monthly temporal resolution (NASA 2007). GLDAS assimilates large quantities of observational meteorological data to constrain modelled outputs, resulting in accurate estimates of many hydrological processes (Rodell et al. 2004a, b). GLDAS-modelled data were area-averaged to produce monthly time series for both Yemen and the study area (NASA 2010). SM data were calculated as the average of all model soil layers, the number of which differs by model but represents a total soil column depth of 0-~200 cm. One model, the CSM, was excluded from this analysis because it was found by the authors to notably understate SM in comparison to the other model outputs.

GRACE data, used to estimate GWSC in the study area, were obtained from two sources: Level_02-Release_04_500 km Gaussian-smoothed Monthly Mass Grids processed by three different institutes, the Center for Space Research (CSR), German Center for Geosciences (GFZ), and the Jet Propulsion Laboratory (JPL), all obtained through GRACE Tellus interface (NASA 2009), and mass concentrations (MASCONs) provided by Stinger-Ghaffarian Technologies (Stinger-Ghaffairan Technologies 2009). The data sets feature different time series and spatial resolutions (see Table 1), but the most crucial distinction between them is that each data set employed different processing techniques to represent TWS in terms of Mean Water Thickness in centimeters (MWT-cm) from the raw GRACE data. MWT-cm expresses mass change in terms of a thin layer of water surrounding the Earth, measured in terms of the height of an equivalent physical quantity of water in centimeters (NASA 2009). These differences mean that use of each data set involves significant trade-offs, which are explored in this paper for the purpose of employing GRACE-based meausrements for regional-scale groundwater management.

Table 1 Primary source data sets

Spatial smoothing is necessary to isolate the TWS signal in an area of interest. The monthly mass grid data were first processed with a land mask to remove atmospheric and ocean effects, and a Gaussian smoothing filter subsequently applied to produce coordinate-referenced TWS measurements at several smoothing scales (Chambers 2006). However, the need for smoothing introduces a “competing requirement” with reducing radiometric error (Swenson et al. 2006). Larger smoothing filters tend to reduce error, but at the expense of spatial resolution (Wahr et al. 2006). The 500 km monthly mass grid data were used here because it produces the lowest error among the monthly mass grid data sets, with mean Root Mean Squared (RMS) error of 1.7 cm (Chambers 2007). However, the monthly mass grid data are known to be subject to significant signal attenuation, so the MASCON data sets were included to provide more signal-stabilized alternatives (Lemoine et al. 2007).

MASCON and NCAR_CSR data were area-averaged to produce a monthly time series for the purposes of comparison to the monthly monthly mass grid-derived data. GRACE monthly data for the study area were obtained by filtering each data set to obtain measurements for this area only. For monthly mass grid data, this was achieved by filtering according to latitude and longitude, and for MASCON data by selecting data from the two MASCON grids which encompass the study area, referred to by their latitude and longitude coordinates, 12_42 and 18_42. Data were then area-averaged across the study area to produce a monthly time series for that region. Missing monthly data in the monthly mass grid-derived time series (June 2003, January 2004, March 2006, and June 2007) were gap-filled using linear interpolation. Resulting time series from the three GRACE data sets represented TWS anomalies, since the mean across each time series was removed during processing (NASA 2009).

Finally, food security data were obtained from the International Food Policy Research Institute (IFPRI), expressed as the proportion of people in each district whose calorie consumption is below the average weekly level for that district (Ecker et al. 2010). These data were used to provide a measure of vulnerability to water scarcity, which is likely to become a central issue for water management in Yemen as water tables fall in many areas, population increases, and interannual rainfall becomes more variable due to anthropogenic climate change (Alderwish and Al-Eryani 1999; Vorosmarty 2000). Food security was selected as an indicator because of the interrelationship, identified by multilateral aid organizations, between water scarcity in Yemen and declining food production, as well as regressive distributional effects induced by high water prices (Forch 2009). However, this is not meant to imply that food security is the dominant developmental concern in rural Yemen, or that reforms to trade policy cannot solve Yemen’s food security problem.

The inclusion of a measure of vulnerability in this research is intended to provide an example of how GRACE-based groundwater stroage assessment can provide useful inputs into water management decisionmaking. This method builds upon the important distinction drawn by scholars between physical, “first-order” water scarcity and a “second-order” lack of “social resources” which can help societies adapt to successfully adapt to physical scarcity (Ohlsson 2000). Since this adaptive capacity reflects the ability of human and natural systems to respond to external stress, it varies widely depending on local ecological, economic, and social characteristics (Yohe and Tol 2002). Parts of Yemen appear to possess indigenous social capacity to adapt to water scarcity, suggesting the presence of multiple inter-linkages between physical water scarcity and socioeconomic factors (Lichtenthaler and Turton 2004). This complex physical and institutional landscape in turn suggests the need for a spatial analysis of vulnerability to water scarcity, to which GRACE, with its large-scale groundwater assessment capacity, seems well-positioned to contribute.

2.3 Analytical Techniques

GRACE estimates of GWSC were calculated by partitioning the TWS signal into groundwater storage (GWS), soil moisture (SM), surface water storage (SWS), and ice and snow (ISS) components using GLDAS-LSM outputs (Syed et al. 2008) according to Eq. 1 (Rodell et al. 2009):

$$ \Delta TWS = \Delta GWS + \Delta SM + \Delta SWS + \Delta ISS $$
(1)

GWSC, expressed in this study as an anomaly relative to the mean of a given time series, thus expresses relative shifts in groundwater, and largely reflects abstraction (Rodell et al. 2007). In most mid-latitude regions, SM is typically the major non-groundwater component of TWS, (Rodell and Famiglietti 2001) so GWS can effectively be isolated from the GRACE TWS signal set by removing SM (Strassberg et al. 2007). However, in this analysis surface water storage and ice and snow were also included in the analysis to constrain the GWSC estimate as much as possible. Surface water storage is here equated to surface runoff (QS), as Yemen has no major reservoirs or standing water bodies. Ice and snow in Yemen are negligible, but was estimated by using GLDAS-modelled snowmelt (SWE). Resulting time series of GRACE-estimated GWSC, surface water, and ice and snow were calculated using SM as modelled by the three GLDAS-LSMs, producing the time series solution sets in Table 2 below.

Table 2 Author-generated solution sets

The accuracy of GRACE-estimated GWSC was established by comparison with observed well measurements in the study area, in effect equating the study area to a single “basin,” which is acknowledged as a simplification given the area’s complex hydrogeology. Moreover, the uneven spatial distribution of wells in the study area (Fig. 1) may not statistically reflect total GWSC throughout the study area. However, the intention here is to evaluate the applicability of GRACE to areas of complex hydrogeology and with limited hydrological information, such as Yemen, and these limitations are inherent to the general problems addressed in this analysis. Indeed, the inadequacy of data on groundwater dynamics and extraction patterns hampers management of aquifers around the world (Moench 2004), and it is in this context that an assessment of the potential for remotely-sensed, GRACE-based estimation of GWSC is particularly relevant. Given these limitiations, the analysis against the study area data should not be treated as definite, but rather as an indication toward representativeness.

Here, accuracy was assessed by fitting a linear function to each GRACE solution set for the study area using the observed well data, calculating both goodness of fit (R-squared) and RMS error. This method, while common where statistical demonstration of relationships is expected, is likely to establish a high threshold for accuracy, since it constructs linear approximations of data with annual variability. RMS error was also calculated between the time series of GRACE-estimated and well-measured GWSC anomalies, establishing a possibly conservative estimate of uncertainty since it accounts only for measurement error, and assumes the basic accuracy of GRACE measurements (Strassberg et al. 2007). However, these measurements have elsewhere been validated by comparison with well data (Rodell and Famiglietti 2001).

Accordingly, time series of observed well-measured groundwater levels were constructed by averaging groundwater level anomalies calculated from each of the five well clusters depicted in Fig. 2. These were then regressed against equivalent time series calculated for each GRACE solution set. Anomalies were calculated in each case by removing the mean monthly GWSC from August 2002–March 2009 from each monthly measurement. Annual averages were then calculated from the monthly anomalies. However, because GRACE solution sets span different time series, it was necessary to harmonize annual averages by calculating averages only for those periods where data for each GRACE time series was available.

Fig. 2
figure 2

Long-term area-averaged trends in Yemen (42.5°E--53.5°E, 12.5°N--17.5°N) monthly observed and modelled rainfall, evapotranspiration, surface runoff, soil moisture, and groundwater, April 1983–March 2009

While GLDAS-modelled hydrological parameters have been found to be generally accurate at the global scale, fewer observations are available to constrain model outputs in developing arid regions such as Yemen. Accordingly, and because SM is crucial to isolation of GWSC from the GRACE TWS signal, GLDAS-modelled SM for the study area was compared against a proxy SM measure, calculated from the average of observational measurements of humidity and temperature from five metereological stations in western Yemen (Fig. 1). The SM proxy was simulated according to Eq. 2 (Fisher et al. 2008):

$$ R{H^\wedge }VPD $$
(2)

where RH is relative humidity between 0 and 1.0 and VPD is vapor pressure defecit calculated according to Eq. 3:

$$ es - e $$
(3)

where es was calculated as a function of maximum temperature (TMX) according to Eq. 4:

$$ 0.61121 * EXP\left( {17.502\left( {TMX} \right)/TMX + 240.97} \right) $$
(4)

and e was calculated as a function of temperature according to Eq. 5:

$$ \left( {RH * 0.61121} \right) * EXP\left( {17.50\left( {TMX} \right)/TMX + 240.97} \right) $$
(5)

These formulae, which are based on the complementary hypothesis linking atmosphere and surface soil water content (Fisher et al. 2008), provide only an approximation of SM, reflecting the seasonality of SM better than the absolute mass, but given the lack of SM observations in Yemen provide the only available alternative to GLDAS-derived SM data.

To visualize the spatial distribution of observed rainfall and observed average groundwater level, Kriging, Inverse-Distance-Weighted (IDW), and spline interpolations were performed in ArcMap 9.3 (ESRI). The Kriging interpolation was preferred in this analysis because it was found by the authors to produce the most accurate interpolation of GWSC (Saraf and Choudhury 1998). Spatial interpolations of the CSR-, GFZ-, and JPL-estimated GWSC anomalies, as well as an average of GWSC anomalies from these three data sets, were then produced for the entire spatial extent of Yemen (12°N-18°N, 42°E-53°E). GWSC estimates for each coordinate-georeferenced data point were calculated by time-averaging the monthly GWSC anomalies estimated by each GRACE-derived data set using Avg_GLDAS-modelled SM. However, because each point measurement is subject to significant signal attenuation, the magnitude of interpolated GWSC at any one point is less meaningful than the general spatial distribution of GWSC values across a large area. MASCON data cannot be similarly plotted because its coarse spatial resolution (4°) obscures finer-scale spatial patterns.

These spatial representations of hydrological data were complemented by visualizing food insecurity data in a GIS, using shapefiles obtained from the Yemen Ministry of Health and Population (Yemen Health Ministry 2004). The weighted overlay feature in ArcMap 9.3 was then used to integrate physical and socioeconomic data. Weighted overlay integrates raster layers with assigned weights, and highlights areas of overlap according to these weights. It is commonly used to identify certain areas with a particular combination of characteristics (Carver 1991), in this case areas of low rainfall, high groundwater depletion and high food insecurity. Groundwater depletion was represented as GWSC estimated by the average of the CSR, GFZ, and JPL data sets for each coordinate-georeferenced data point. Food insecurity was chosen for the weighted overlay because, unlike the other socioeconomic data, it is disaggregated at the district level. It is thus least likely to produce an ecological fallacy, whereby relationships in aggregated, “ecological”-scale data would not be replicated in samples of individuals at smaller scales (Kramer 1983).

Three different combinations of weights were applied, reflecting different levels of emphasis attached to the various input layers. These specific weights are designed to illustrate different ways in which policymakers could model different policy considerations. For example, a greater focus on food security could be reflected in a higher weight applied to the food security data relative to groundwater depletion. Specific weights should thus reflect the outcomes of deliberative policy-making processes, but can be a useful reflection of varying policy goals (Webster 1993). The purpose of employing weighted overlay here is simply to illustrate how this technique might be employed to integrate observed hydrological, GRACE-estimated, and socioeconomic data, given a pre-defined set of policy goals.

3 Results

3.1 Physical Dimensions of Water Scarcity in Yemen

General hydrological trends help contextualize physical water scarcity in Yemen, as shown in Fig. 2 for the period February 1983–March 2009 (the total length of the observed well data time series). GLDAS-modelled evapotranspiration (ET), runoff, and SM, the latter two of which are used to partition the GRACE TWS signal, are displayed as outputs from the three GLDAS models, as well as an average of the three model outputs. The top-most panel indicates that rainfall in Yemen has been relatively stable in recent decades. Global Precipitation and Climatology Project (GPCP) observational data indicate a non-significant (p > 0.6) linear trend in rainfall rate over the time series, while NWRA ground station observations demonstrate significant increasing (p = 0.1) rainfall over a slightly shorter time series, ending in December 2007. Over this period, GPCP and NWRA data evince very similar temporal patterns, though the GPCP data appear smoothed relative to the NWRA data. This is to be expected, since the NWRA represent point observations, whereas the GPCP data are area-averaged.

The second, third, and fourth panels display GLDAS-modelled ET, runoff, and SM, respectively, the former of which is a crucial indicator of crop and ecosystem water availability (Shukla and Mintz 1982), and the latter two of which are used to partition the GRACE TWS signal. In each panel, outputs from the three GLDAS models are presented along with an average of the three model outputs, which are very similar in phase and trend but not necessarily in magnitude. Here, GLDAS-modelled monthly ET data evince a significant (p < 0.01) negative linear trend. Ice and snow, a third component of the TWS signal, is not displayed here because ice and snow in Yemen over the time series is negligible, at <1 mm/month.

Over the 1983–2009 period, GLDAS-modelled runoff data evince a non-significant positive linear trend (p > 0.6), with runoff maxima around July/August and November/December, and additional elevated runoff in March/April. A number of high-runoff events dominate the time series, roughly clustered in 1991–1995, 1997–2000, and 2002–2005. Overall, however, runoff has declined substantially in recent years.

SM data indicate a significant negative linear trend (p < 0.01) over the time series, which is dominated by several periodic, dramatic increases in SM, notably in 1997 and 2001. The data further indicate a precipitous decline in SM after 2001, followed by a notable increase in September 2006. The varying magnitude of SM estimated by the three GLDAS models is also notable. VIC-modelled monthly SM is approximately 50% greater than NOAH-modelled monthly SM, while MOS- and Avg_GLDAS-modelled monthly SM is approximately 30% greater than NOAH-modelled monthly SM. SM is approximately 30% lower over the same time series than GLDAS-modelled measurements in Syria. Phase and trend, however, are very similar across all SM solutions.

The final panel in Fig. 2 displays the full time series of observed groundwater levels as obtained from NWRA, spanning the period February 1983–March 2009. Unlike the other data sets in Fig. 3, well measurements are available only in the study area and not for the entire area of Yemen. The time series displays a significant (p < 0.01) negative linear trend over the period 1983–2009. Several outliers appear in the later 1997–2009 time period, most notably several large decreases in groundwater level in 2001–2006. These events do not appear to temporally coincide with any of the other time series presented in Fig. 3, though the stability of the precipitation regime shown in the upper-panel contrasts notably with the secular, negative linear trend in observed groundwater levels. Given stable precipitation and decreasing ET, the most likely explanation for falling groundwater levels is sustained abstraction of groundwater in excess of the natural rate of recharge.

Fig. 3
figure 3

Area- and annually-averaged monthly JPL_In_Situ- (August 2002–December 2006) and MASCON_NOAH- (July 2003–April 2007) estimated GWSC anomaly and average observed groundwater level anomalies for the study area (42.5°E--45.5°E, 12.5°N--17.5°N)

Nonetheless, a closer look at the groundwater levels suggest several deficiencies in the well data. The amplitude increases markedly after 2001, reflecting increased discontinuity in the number of well measurements after that date. Moreover, the length of the time series and the number of observations for each year varies. In some clusters, measurements were taken only in March, in others, January and December only. Frequency of record also varies along the length of the time series, with earlier years being generally more complete. The discontinuity of the data often results in some monthly average groundwater level measurements being dominated by data from a single cluster. The high monthly groundwater level in May 2007, for example, results from a single corresponding outlier in the Sanaa cluster. Because data from each well cluster reflects a specific underlying geology with different recharge rates, dominance by one cluster with high recharge rates can produce large changes in GWSC. Despite repeated attempts, the discontinuity of the data made it impossible to eliminate this effect. The well data for the period 2002–2009 should thus be viewed with scepticism, and suggests the value of remotely-sensed GWSC data.

Table 3 displays results for the regression of annually-averaged GRACE-estimated GWSC anomalies against observed well data for the study area over the respective GRACE-derived data set time series. The CSR- and MASCON-derived solution sets generally produced the highest R-square values against the well data, while the GFZ-derived solutions produced noticeably low R-square values (<0.1). MASCON-derived solution sets produced the lowest standard and RMS error, followed by the CSR-derived solutions. Among the SM solution sets, the MOS-derived solutions tend to produce lower R-square values and higher standard and RMS error than the NOAH- or VIC-derived solutions. For the JPL- and GFZ-derived data, In situ-derived solutions produced higher R-square values than the NOAH-, MOS-, and VIC-derived SM solutions. However, the in situ-derived solutions produced generally higher standard and RMS error than the other SM solution sets.

Table 3 Regression and Error Statistics for Area- and Annually-Averaged Monthly CSR/JPL/GFZ/NCAR_CSR_GLDAS- (August 2002–March 2009), CSR/JPL/GFZ/NCAR_CSR_In_Situ- (August 2002–December 2006), MASCON_14_42/18_42_NOAH- (July 2003–April 2007), and MASCON_14_42/18_42_In_Situ- (July 2003–December 2006) Estimated GWSC Anomalies Against Average Observed Monthly Groundwater Level Anomalies for WSA (42.5°E-45.5°E, 12.5°N-17.5°N)

Three solution sets produced strong linear fits against the well data: JPL_In_Situ (R-square = 0.71, p = 0.07), MASCON_14_42_NOAH (R-square = 0.79, p = 0.07), and MASCON_18_42_NOAH (R-square = 0.56, p = 0.14). Figure 3 displays annually-averaged GWSC anomalies produced from these three solution sets along with annually-averaged observed groundwater level anomalies for the corresponding time series, August 2002-December 2006 for the JPL_In_Situ solution set, and July 2003–July 2007 for the MASCON solution sets. Well data were divided by 10 to fit visually on the same axis. The two groundwater level anomaly data series are very similar over these two time periods, displaying two periods of precipitous decrease from 2003–2005 and 2006–2007, separated by an increase in 2005.

The JPL_In_Situ data show a corresponding decrease and slight increase in 2005, though its representation of this trend appears smoothed, lacking the 2002 plateau visible in the observed well data. The MASCON data similarly appear smoothed, displaying a continuous decrease from 2003 to 2005, followed by a substantial increase in the annually-averaged monthly GWSC anomaly in 2006, and then another decrease in 2007. The MASCON data also appear to be out of phase, as the 2005 increase in the observed well data is not recorded by the MASCON data until 2006. Finally, it is notable that MASCON_14_42_NOAH-estimated GWSC anomaly is positive over the time series, while the MASCON_18_42_NOAH-estimated GWSC anomaly is negative. The magnitude of the MASCON_14_42_NOAH-estimated GWSC is also slightly greater than that of the MASCON_18_42_NOAH-estimated GWSC.

Figure 4 displays the JPL_In_Situ and MASCON solution sets in scatter-plot form, facilitating comparison to the observed well data, which is represented as a one-to-one dotted gray line. MASCON data is plotted on the secondary axis to better compare with the JPL_In_Situ solution set, making clear the strong linear fit against the observed well data. The MASCON_14_42_NOAH solution set demonstrates a similarly strong linear fit in comparison to the well data, but along with the MASCON_18_42_NOAH data appears to be over-smoothed relative to the JPL_In_Situ and well data.

Fig. 4
figure 4

Area- and annually-averaged monthly JPL_In_Situ- (August 2002–December 2006) and MASCON_14_42/18_42_NOAH- (July 2003–April 2007) estimated GWSC anomaly, plotted as a function of average observed groundwater level anomalies for study area (42.5°E--45.5°E, 12.5°N--17.5°N)

As noted in Section 2, accurate estimates of SM are essential to isolating groundwater from the GRACE TWS signal. Figure 5 displays SM estimates from the three GLDAS- LSMs, an average of these three LSM outputs, and the in situ-modelled calculation, which is derived from meteorological station observations. These time series spans August 2002–December 2006, since the observed data used to calculate the In situ-modelled SM were only available during that period. As in the SM panel of Fig. 2, GLDAS-modelled SM from the various LSMs display very similar phase and trend, though with different magnitude: MOS is highest, VIC lowest, and NOAH intermediate. This variation in magnitude among SM solutions suggests a relationship to the similar magnitude-variation observed in GWSC estimates obstained from different GRACE solutions.

Fig. 5
figure 5

Area-averaged monthly GLDAS- and in situ-modelled SM anomaly for study area (42.5°E --45.5°E, 12.5°N --17.5°N), August 2002–December 2006

3.2 Spatial Dimensions of Water Scarcity in Yemen

Figure 6 displays the spatial distribution of rainfall in Yemen, time-averaged across the January 2002–December 2007 time series. A single contiguous region of relatively high precipitation in the southwestern portion of the country is evident, while other parts of Yemen experience low rainfall. In general, rainfall appears to increase along an east–west gradient, with lowest precipitation concentrated along the eastern border. The high rainfall region coincides with the intermontane region in the study area, and is bordered by a narrow region of intermediate rainfall.

Fig. 6
figure 6

Inverse distance-weighted interpolation of time-averaged monthly NWRA ground station rainfall observations, January 2002–December 2007

This spatial distribution of rainfall contrasts with that of groundwater, displayed in Fig. 7, which displays spatial estimates of mean monthly GWSC from the CSR, GFZ, and JPL data sets, as well as an average of GWSC estimates from all three data sets (Avg_GRACE), time-averaged across the August 2002–January 2010 period, the magnitude of which varies according to the data set from which it was derived. The GFZ-derived data produce generally higher estimates of GWSC, while the CSR-derived GWSC estimates are generally lower. There are also a few notable differences between data sets; the area of high negative GWSC along the eastern border produced by the GFZ- and JPL- derived interpolation is less pronounced in the CSR-derived interpolation, while the CSR-derived interpolation shows a stronger high negative GWSC signal in the southwest than do GFZ- and JPL-derived interpolations. This result indicates again that various GRACE solutions produce similar directional and spatial estimates of GWSC, but are smoothed to varying degrees.

Fig. 7
figure 7

Time-averaged kriging interpolations of Monthly GRACE-Estimated GWSC in Yemen (42.5°E -- 53.5°E, 12.5°N – 17.5°N), August 2002–January 2010

Nonetheless, spatial interpolation of mean GWSC derived from all three data sets produce very similar results, with areas of negative GWSC occurring in a belt along the southern coast, and GWSC appears to be at its lowest in the southwest. An area of high negative GWSC is also evident extending inland from the south towards and above Sanaa.

While Figs. 6 and 7 illustrate the physical dimensions of rainfall and groundwater, Fig. 8 provides an indication of the socioeconomic dimensions of water scarcity in Yemen. The percentage of food-insecure households is notably high in a belt east and southeast of the intermontane region. In many of the districts in these regions, >49% of households were classified as food-insecure. Notable concentrations of districts with high percentages of food-insecure households also occur in the southwest and north of Sanaa.

Fig. 8
figure 8

Percentage of food-insecure households by district, 2009

Figure 9 overleaf attempts to integrate the physical dimensions of water scarcity displayed in Figs. 6 and 7 with the socioeconomic dimension outlined in Fig. 8 by applying a weighted overlay analysis of areas of low rainfall, as indicated by ground station observations, high levels of groundwater depletion, as estimated by GRACE, and high levels of food insecurity, as measured by the IFPRI data displayed in Fig. 8. In the upper panel of Fig. 9, each of the three input factors is given approximately equal weight, implying equal emphasis on physical and socioeconomic vulnerability factors, producing concentrations of combined low observed rainfall, high GRACE-estimated groundwater depletion, and high food insecurity in many parts of Yemen, but especially in the southern and eastern coastal regions.

Fig. 9
figure 9

Spatial coincidence of low observed rainfall, high GRACE-Estimated groundwater depletion, and low food security (Weighted overlay, various weights)

The middle panel applies equal weight to only two input factors, GRACE-estimated high groundwater depletion and high food insecurity, implying concern only for groundwater depletion as a physical indicator of water scarcity, producing a spatial coincidence of both to the north, east, and southwest of Sanaa, as well as in the southern and eastern coastal regions. The lower panel includes low observed rainfall as an input parameter, but assigns it a low weight, again implying less concern that for groundwater depletion as an indicator of vulnerability, producing a similar spatial distribution to the middle panel but with greater emphasis on the southwest and the southern coast. As noted in the Methods, the choice of weights here is purely illustrative, and would depend on the particular concerns of policymakers. Nonetheless, this type of analysis suggests the varied, inter-linked dimensions of water scarcity in Yemen, and the importance of effective water management.

4 Discussion

4.1 Physical Dimensions of Water Scarcity in Yemen

Most hydrological indicators, including ET, runoff, SM, and groundwater level, suggest a decline in water availability in Yemen in recent decades. Declining trends in ET and SM are of special concern for agriculture and ecosystem function, because research shows that in much of Yemen actual ET rates are far below potential levels, and that a soil moisture deficit occurs during much of the year (Noman 2006). In contrast, long-term trends in rainfall suggest stable or increasing total precipitation in recent decades, in turn suggesting that the marked, long-term decline in observed groundwater levels is due to the absence of water management regimes to constrain groundwater abstraction, rather than any decreasing trend in recharge from precipitation (Negenman 1997).

Most of the 24 GRACE-derived GWSC solution sets produced relatively low standard and RMS error when regressed against equivalent time series of observed groundwater levels. RMS error was between 0.04 MWT-cm (MASCON_12_42_NOAH-derived solution) and 2.10 MWT-cm (GFZ_In Situ-derived solution), which compares favorably with the RMS error estimates obtained by comparing GRACE-derived TWS estimates with hydrological models for several American river basins, which ranged from 1.29 to 4.91 MWT-cm (Swenson and Wahr 2003). This comparison suggests the basic accuracy of GRACE-derived GWSC estimates even over basins featuring complex hydrogeology. However, given the discontinuity and limited spatial extent of the available well data, additional validation in areas with more extensive observational measurements should be conducuted to assess the generalizability of this claim.

Indeed, only three solution sets, JPL_In_Situ, MASCON_14_42_NOAH, and MASCON_18_42_NOAH, produced R-square values of >0.5 in combination with low standard and RMS error when compared against the well data. In the case of the MASCON-derived solution sets, standard and RMS error was very low (<0.1), suggesting that MASCON-derived solution sets correspond particularly well with observed groundwater levels. However, the R-square value produced by the MASCON_14_42_NOAH solutions is noticeably higher, and the P-value of the regression line noticeably lower, than that produced by the MASCON_18_42_NOAH solution set, suggesting some variation in accordance with well data even within GWSC estimates derived from the same GRACE data set.

Much of this variation is reflected in the different magnitude of GWSC estimates produced from different GRACE solutions, as Fig. 4 indicates. This variation is to be expected given the varying temporal resolutions of the different data sets, which represent different degrees of smoothing. In particular, the MASCON-derived GWSC estimate appears highly smoothed in comparison with the JPL_In_Situ data, which is to be expected given the coarser spatial resolution of the MASCON-derived data. The accuracy of the MASCON-derived data relative to well measurements is also expected, because the MASCON data are stabilized and so are subject to less signal attentuation than the monthly mass grid-derived data.

The strong linear fit (p > 0.7) of the JPL_In_Situ-derived GWSC solutions against the well data, in contrast, appears to be due to the increased GWSC anomaly in 2005 (Fig. 2), which is visible in the well data time series but not replicated in the other MMG-derived data sets. This increase in turn appears to be due to the 2005 spike in In situ-modelled SM, which is not replicated in the other SM solutions (Fig. 5). Notably, this anomaly temporally coincides with a high-precipitation event visible in the observed rainfall data in Fig. 3, but not in the GPCP data. This suggests that the strong fit of the JPL_In_Situ-derived GWSC estimates against the well measurements results from the ability of the In situ SM solution to replicate a single high-precipitation event which was not visible except in local observational data.

It is also notable that the magnitude of GWSC as estimated by the JPL_In_Situ- and MASCON-derived solutions for the study area varies considerably, although the two solutions correspond well with equivalent time series of average observed groundwater level anomalies in the same area. This suggests that GWSC estimates derived from certain GRACE data sets can replicate trends in observed groundwater level fluctuations, but not necessarily its magnitude. This interpretation of results is further suggested by the fact that the magnitude variation again appears to be related to the coarser scale of the MASCON data. Indeed, the continuing importance of the smoothing effect evinces the well-described relationship whereby the accuracy of GRACE-derived GWSC estimates increases with the scale of observation (Wahr et al. 2006; Swenson and Wahr 2003).

Confidence in the ability of GRACE-estimated GWSC to reflect observed trends in groundwater storage at large scales is also suggested through comparison of GLDAS- and In situ-modelled soil moisture. Given the limitations of the in situ modeling approach, which included a limited number of observation locations, it is particularly notable that the In situ-modelled SM replicates major anomalies in the GLDAS-modelled time series. A possible explanation for this unexpectedly close agreement is the influence of humidity on SM. Increased levels of GLDAS-modelled humidity, when graphed in time series, appear temporally to coincide with similar elevated SM anomalies in both the In situ- and GLDAS-modelled SM data. The close accord between In situ- and GLDAS-modelled SM data suggests that GLDAS-modelled SM, a crucial element of partitioning the GRACE TWS signal, is generally accurate in Yemen, by extension suggesting the basic accuracy of GRACE-estimated GWSC.

In sum, the similarity of the JPL_In_Situ- and MASCON-derived time series suggests that while the magnitude of GRACE-estimated GWSC may be more or less accurate depending on the spatial resolution of the parent data set, different GRACE solutions can accurately indicate the general pattern and trend in GWSC over large scales and at a monthly temporal resolution, even given complex hydrogeological substrate. However, this assessment assumes the basic accuracy of the well data, which as has been stressed is problematic. Accordingly, further validation of GRACE in areas of complex hydrogeology should be attempted. Nonetheless, on the basis of these results GRACE appears to be particularly useful in assessing trends in GWSC at large scales.

4.2 Implications for Water Management in Yemen

The promise of GRACE-based groundwater storage assessment begs the question of what role it might usefully play in improving groundwater management in Yemen, which itself entails a more detailed discussion of the institutions which currently attempt to regulate water use. These consist of Yemeni government agencies, especially NWRA and the Ministry of Agriculture and Irrigation, multilateral institutions including the World Bank and UN Development Program, and an array of local structures including Local Basin Councils (LBCs), Water User Associations (WUAs), and Water User Groups (WUGs) (Al-Asbahi 2005); (UN Development Program 2003). To this must be added the complex tribal relationships that exist in most of the country (Lichtenthaler 2003). A series of institutional reforms, including the 2002 Water Law, which asserted state ownership of Yemen’s water resources (Richards 2002), have nonetheless failed to establish effective groundwater regulation. A World Bank review of water management in the Sanaa basin, for instance, found little evidence of enforcement of Yemen’s restrctions against well-drilling (World Bank 2010).

This ineffectiveness is closely linked to a lack of adequate regulatory capacity, and in particular the ability to assimilate and analyze data on groundwater abstraction. Despite establishing a water resource monitoring network with funding from multilateral donors and the training of some civil servants in RS and GIS techniques (Al-Asbahi 2005), Yemen’s government lacks the information and human capital necessary to enact effective regulation. A 2010 World Bank review found that many NWRA monitoring facilities in the Sanaa basin were out of order, that farmers and WUAs were uncooperative in providing data, and that the data gathered from these stations was insconsistent and poor in quality (World Bank 2010). Under such circumstances, GRACE data is likely to provide an effective complement to existing water resources information. Because it is remotely-sensed, it does not depend on individuals or local associations to provide data, and, given the expense and difficulty of travel across Yemen under present security conditions, remotely-sensed GRACE data may indeed be more cost-effective than individual well monitoring.

However, it is less certain that, given deficiencies in institutional capacity, GRACE data could be effectively employed. Independent reviews have found acute deficiencies in trained staff within Yemeni government agencies (World Bank 2010). Another report found that “there was no mechanism for scientific processing and sharing” of water resources data within NWRA (UN Development Program 2003). Given the computing requirements and methodological complexities of using GRACE data, a substantial investment in personnel training and retention would thus be necessary to make effective use of GRACE data in contexts such as Yemen.

In light of these deficiencies at the national level, recent water reforms, funded mainly by external donors, have emphasized Integrated Water Resources Management (IWRM) at the local level. IWRM is widely recognized as a model for best practices in water management, emphasizes the coherent management of all water resources, including surface, ground, and soil water for multiple agricultural, industrial, ecosystem and other uses. To accomplish these objectives, IWRM encourages the adoption of a mix of policy instruments, a focus on the equitable allocation of water resources, and the consultation of diverse stakeholders (Chene 2009). In Yemen, IWRM-based reforms have focused on working with basin management committees and water user associations, on the assumption that, as a UN report concluded, “[local Yemeni] communities have the advantage that they can better monitor compliance with the law and bring social pressures to bear on [water law] violators” (UN Development Program 2003).

However, it remains unclear whether these reforms alone can effect sustainable groundwater use, which previous scholarship suggests requires a high degree of trust among resource users founded on information-sharing (Blomquist 1994). As one study found, “The perception of many farmers is that as groundwater levels drop further, there will be less and less co-operation among stakeholders. Even now, everyone tries to minimize his personal loss at the expense of the common resource” (Kohler 2000). Another quoted a farmer as explaiing that “We don’t know each other and we don’t trust each other; there is hardly any cooperation between us, and it will be difficult to achieve a consensus to reduce groundwater abstraction” (Lichtenthaler 2003). This apparent lack of trust suggests the need to facilitate better information-sharing as a complement to institutional, IWRM-based water management reform, which the integration of GRACE and socioeconomic data promises to accomplish.

4.3 Spatial Dimensions of Water Scarcity and Vulnerability in Yemen

The results of Fig. 9 indicate that spatial analysis of GRACE-based groudwater assessment can help to fill this gap. Spatial interpolations of CSR-, GFZ-, and JPL-derived GWSC estimates produce encouragingly similar results, with negative GWSC concentrated in the intermontane region, the southwest, the southern coast, and along the eastern border. The magnitude of GWSC as estimated from the various data sets varies, but the spatial pattern remains consistent. This marked congruence among estimates derived from very different solution sets suggests that different GRACE solutions are capable of producing an accurate, uniform spatial picture of GWSC.

This contention is strengthened by comparison with anecdotal accounts of spatial variation in groundwater abstraction. A 2009 NWRA report indicates that observed groundwater depletion is highest in the Sanaa basin, the governorates north of Sanaa, the intermontane region surrounding Sanaa, Taiz in the southwest, and along the southern coast (NWRA 2009a, b), all of which correspond with spatial estimates of GRACE data. Finally, the spatial distribution of observed and GRACE-estimated GWSC is largely similar, with regions of negative GWSC concentrated in the intermontane region, approximately around the Sanaa basin.

Despite this congruence, two features of the spatial interpolations require further comment. The first is the large area of negative GWSC along the eastern border with Oman. This result is surprising because there is relatively little human activity in the area. The two governorates which border Oman, Hadramaut and Al-Maharah, are the least densely-populated areas in Yemen (Central Statistical Organization 2008). The second is the area of large-magnitude GWSC which GRACE estimates produce in the southwest. Signal leakage is a possible explanation, but in general ocean signals tend to be weaker than those on land (NASA 2009).

Comparison with the spatially plotted socioeconomic data offers tentative alternative explanations for these discrepancies. A high percentage of cultivated area in Al-Maharah is devoted to qat production, and production in larger Hadramaut focused on cash crops, which tend to be more groundwater-intensive (Forch 2009). Moreover, groundwater irrigates >30% of cultivated land in the southwestern region, where GRACE-estimated GWSC suggest high rates of groundwater depletion, suggesting a linkage between the two.

Spatial variation in socioeconomic indicators indicates how differently parts of Yemen are likely to be affected by water scarcity, both in terms of low rainfall and declining groundwater resources. Water scarcity generally refers to a situation in which water supplies are insufficient for all human and natural users (UN Food and Agriculture Organization 2007). Current scholarship emphasizes the need to consider physical, social, and economic aspects of water scarcity, including not only the quantity of available water but also factors such as the ability of households to access it (Sullivan et al. 2003); (Rijsberman 2006). Moreover, vulnerability to environmental change generally, and water scarcity specifically, varies depending on the social resources available to a community (Turton and Ohlsson 2000); (Smit and Wandel 2006). A complete assessment of vulnerability must therefore be more detailed than is possible with coarse-scale socioeconomic data presented here, and must take account of complex community-scale political, social, and economic conditions (Lichtenthaler and Turton 2004).

Nonetheless, this spatial analysis of GRACE-estimated GWSC alongside socioeconomic data points to the potential of integrating the two in order to reduce the high information costs identified in the institutional analysis. The weighted overlays presented here further illustrate how this information can be integrated with socioeconomic data to indicate areas of particular concern related to water scarcity. In this case, the choice of high food insecurity, low observed rainfall, and high GRACE-estimated groundwater depletion identifies the desert fringe east of Sanaa and the southern coastal region as areas of particular concern, though these areas differ depending on the weighting scheme applied. The choice of input factors, and the weights applied to them in this kind of analysis will depend on a particular policy goal. But this illustration points the way forward to the integration of physical hydrological data, enabled by GRACE-estimated GWSC, with socioeconomic data for effective water management. The remaining challenge, as indicated by the institutional analysis, is how to realize this potential in the context of a fragmentary and information-constrained system.

5 Conclusion

The ability to detect groundwater storage changes using satellite remote sensing represents a significant advance given the expense and difficulty of observational measurement, and promises to address the lack of information which hampers effective management of groundwater resources in areas such as Yemen. However, GRACE-based estimation of GWSC is most useful at large scales and for the analysis of general trends and spatial patterns in groundwater storage change. This suggests that GRACE-estimated GWSC can help address water management challenges at national and regional levels such as the assessment of spatial vulnerability to water scarcity, but is less useful in the context of localized, IWRM-driven water management strategies. There is a wider need for water managers, scientists, and policymakers to adopt new approaches to translate the large-scale water resources assessment enabled by satellite remote sensing of water resources into more effective IWRM management strategies at local scales.

Doing so requires advancements in two areas. First, effective integration of GRACE-derived data into management strategies requires downscaling regional-scale data to the local scales at which most management takes place. In Yemen, this means accounting for the hydrogeology of small, discontinuous basins. A possible solution is the development of localized groundwater models which incorporate GRACE-estimated GWSC.Footnote 1 These models might rely on GRACE data to provide estimates of GWSC without the expense of well measurement, but also take account of localized groundwater characteristics. Recent research suggests the validity of this concept by indicating that GRACE-derived data compares well with outputs from groundwater models forced by other data (Niu et al. 2007). Such models could play an important role in IWRM-driven management strategies by providing more accurate information about the structure of groundwater basins as well as patterns of groundwater use. Such information can in turn be used to build trust among groundwater abstractors in a given basin (Ostrom 1986).

However, the promise of GRACE-based groundwater assessment at large scales points to the need to combine local institution-building with greater support for technical training and equipment at national levels, including within NWRA and in Yemeni academic institutions. Previous water sector reform efforts included a focus on enhancing groundwater monitoring capabilities in NWRA (UN Development Program 2003), but these have been eclipsed in more recent initiatves. Moreover, the subtleties involved in using GRACE effectively indicates that remote sensing is not a complete replacement for observational monitoring, and that a balance should be struck between enhancing remote sensing-based and observational groundwater monitoring. Finally, it must be made clear that even full and effective integration of GRACE as a component of better-informed groundwater management in Yemen is likely to have only a marginal impact on the country’s most pressing developmental concerns, such as food security. GRACE has a role to play in addressing such concerns, but they are modest in comparison to fundamental, developmental reforms in areas like trade and economic pricing of water.

Beyond these conclusions, this article suggests three important areas of future research concerning remotely-sensed groundwater resource assessment and groundwater management more generally. First, the ability of GRACE to model GWSC in small, discontinuous basins deserves further research in areas with better-quality observational data. Second, the linkages between information costs and ineffective groundwater management should be more clearly articulated, particularly in areas like Yemen, where a legacy of effective surface water management exists. Finally, more scholarly attention should be devoted to groundwater management at different scales, as has indeed recently been suggested for surface water (Fischhendler and Feitelson 2003).

At the broadest level, it is clear that in many cases advances in remote sensing serve to illustrate that the technical capacity to detect changes in water resources far outstrips the institutional capacity to manage them. The transformational ability of GRACE to detect changes in large-scale groundwater storage is thus emblematic of the mismatch between the empirically evident scale of many environmental challenges and the institutional solutions available to meet them (Gibson et al. 2000). Going forward, the challenge for water management in Yemen and elsewhere will be to think broadly and innovatively about how to incorporate data from new remote sensing platforms such as GRACE into effective strategies for the sustainable use of shifting and often shrinking water resources.