“Measurement drives diagnosis and response. As global attention returns to food security, new opportunities emerge to improve its measurement.” Barrett (2010)

Introduction

Dissatisfaction with existing food security indicators is hardly new. Estimates of the prevalence of hunger (undernourishment) from the Food and Agriculture Organization (FAO) have been widely criticized for some time for lacking accuracy in both cross-sectional comparisons and trends (Gabbert and Weikard 2001; Nubé 2001; Smith 1998; Svedberg 1999, 2002). The World Bank’s poverty estimates also have significant weaknesses for drawing cross-country comparisons and inferring global trends (Deaton 2010). The 2008 global food crisis—and the academic debate surrounding its impacts on poverty (Headey 2013; Swinnen 2010)—revealed an additional shortcoming: the inability of international agencies and national governments to monitor food security in a sufficiently accurate and timely manner. This shortcoming is also likely to become more costly in the near future. Food prices are predicted to remain high and volatile for the coming decade at least (OECD-FAO 2009; USDA 2009), and climate change could leave many countries more frequently exposed to severe weather events (IPCC 2012). Now, more than ever, there is an increased demand for the improved measurement of both food and nutrition security in the developing world.Footnote 1

It is less clear, however, how food security measurement should be improved. In addition to the obvious but under-discussed issue of the costs of alternative measurement systems, the bewildering proliferation of food security indicators in recent years has provided greater variety, but little consensus, and insufficient coordination among different agencies. Moreover, although the justified mainstreaming of nutrition in the development dialogue has elevated nutrition security—particularly for infants in the first thousand days of life, and hence, for their mothers as well—to a critically important development goal (Nabarro 2010), there has been insufficient discussion of how food security measurement can be made more “nutrition-sensitive”.

In this paper we therefore reassess the direction that food security measurement should take by gauging the extent to which different indicators satisfy several key criteria. Decision-makers demand a range of different information from food security indicators, though these different types of information are rarely made explicit. We broadly categorize these different types of information into three dimensions. First, decision-makers need to make a wide range of cross-sectional “snapshot” comparisons: between different social groups, different regions, and different countries. Second, decision-makers need different sorts of inter-temporal comparisons: on long-term trends, on the seasonality of food insecurity, and on the impacts of shocks, such as droughts, floods, or changes in incomes and prices. And third, decision-makers increasingly demand nutritional relevance in their programming, suggesting the need for food security indicators to impart information on the demographic dimensions of food insecurity (such as the relative vulnerability of infants, children, and adults, of male and females, and of pregnant and breastfeeding women) as well as the epidemiology that links food intake to nutritional outcomes. This last dimension obviously refers to the relative importance of macro- and micronutrient deficiencies, but also to interactions between diets, health burdens, childcare practices, and nutrition outcomes.

The criteria above—our ‘first principles’ - are certainly demanding, but they are also consistent with existing definitions of food security, such as the widely cited FAO definition of “all people, at all times” having access to “nutritious food” (FAO 1996). Despite this, few existing reviews of food security indicators address all three criteria, or systematically compare which indicators perform well on which criteria. This paper aims to address that knowledge gap. We do so for four classes of indicators, which we believe to be the most common in the current literature:Footnote 2 Calorie deprivation indicators section; Monetary poverty indicators section; Dietary diversity indicators section; and Subjective indicators section.Footnote 3 Methodologically, we apply a mix of literature review, conceptual discussion, and fresh empirical analysis to judge whether each indicator can yield valid and reliable data on the true differences between different states (i.e., individuals, groups, countries, time periods, and different nutritional outcomes). We consider validity as the extent to which a concept, conclusion, or measurement is well-founded and corresponds accurately to the real world, and reliability as the ability of an indicator to perform consistently, such as test-retest reliability.

We acknowledge up front that the information available for informing these judgments is imperfect, and that further research is still needed in many areas.Footnote 4 Indeed, in our concluding section we argue that such research is essential for further improving food security measurement, particularly at the global level. We conclude that the largest information gaps—and therefore the greatest gains from bridging them—pertain to three interconnected issues: the quality of diets (the need to go beyond calorie consumption); demographic dimensions (the need to go inside the household for greater nutritional relevance); and high-frequency data (the need to systematically gauge shocks and seasonality). We suggest that moving to ‘best practice’ in food security measurement requires bridging these knowledge gaps, scaling up resources in some areas, cutting them back in others, and promoting much greater inter-agency coordination.

Calorie deprivation indicators

Calorie availability/deprivation is one of the oldest indicators of food insecurity. It is measured by the FAO at the country level based on national food balance sheets,Footnote 5 but also at the household level from expenditure/consumption data available in standard economic surveys such as the Living Standards Measurement Study (LSMS) surveys of the World Bank. To distinguish between them we will hereafter refer to the two types as the FAO undernourishment indicator and household calorie consumption indicators.

Cross-sectional validity

Unlike all of the other food security indicators discussed here, the FAO undernourishment indicator is solely measured at the national level. This largely stems from necessity since this indicator is derived from national food availability estimates (that is, production plus net imports less storage and wastage reported in the food balance sheets), which are then given an artificial distribution based on food consumption data from occasional household surveys, and demographically adjusted estimates of minimum calorie requirements (FAO 2003).

The numerous assumptions built into this approach have long formed the basis for most of the criticism directed at the FAO measure (Gabbert and Weikard 2001; Nubé 2001; Smith 1998; Svedberg 1999, 2002). However, in principle, household survey data of food consumption could be used to measure the proportion of a population with inadequate calorie consumption (as in Smith et al. 2006). Even so, it is an open question as to whether household surveys or national food balance sheets provide better estimates at the aggregate level. Both face sizeable measurement errors, albeit from different sources. The FAO must often rely on plainly unreliable national data sources, with data on wastage and storage being particularly suspect. Household survey data are instead flawed by recall errors, biases, and choice of survey instrument (Beegle et al. 2012), as well as more specific problems related to food consumed outside the home, wastage and storage, and food given to animals, employed laborers or guests (Bouis et al. 1992; Bouis 1994; Sibrian 2008; Smith and Subandoro 2007). The defining difference with the FAO undernourishment indicator, however, is that the distribution of calories over the population is simulated by household data rather than observed directly. Furthermore, the inability of the FAO approach to yield estimates for subnational groups clearly limits the policy relevance of the indicator.

Nutritional relevance

Insofar as calorie indicators are not commonly or easily measured at the individual level, their nutritional relevance is clearly limited. Even so, if household calorie consumption were a strong predictor of individual nutrition outcomes we might be much less concerned at the inability of these indicators to measure individual level outcomes. Yet in several prominent countries there appears to be either a very weak correlation, or no correlation, between calorie deprivation and anthropometric indicators of malnutrition. Deaton and Drèze (2009) find that Indian regions with high calorie consumption often have higher malnutrition, and Pelletier et al. (1995) make similar observations for Ethiopia. In Table 1 below—a correlation matrix between a range of food security and nutrition indicators—we also found no signfiicant correlation between household calorie consumption and child height-for-age and weight-for-age z-scores in Malawi.Footnote 6 While one cannot rule out the possiblity that anthropometric indicators are also flawed, other food security indicators at least achieved statistically significant correlations with these anthropometric indicators (e.g. dietary diversity)Footnote 7. Thus, in some important contexts, household calorie consumption seems to be a poor predictor of individual nutrition outcomes. Moreover, calorie indicators capture access to sufficient food, ignoring issues of dietary diversity and micronutrient requirements, which are particularly critical for physical and cognitive development of children.

Table 1 Correlation matrix of food and nutrition security indicators for Malawi

Inter-temporal validity

To what extent is calorie deprivation a valid and reliable indicator of food security trends, of the impacts of shocks, and of seasonal deprivation?

The FAO undernourishment indicator has long been used to gauge trends in global hunger, and some developing countries also focus considerable attention on trends in household calorie consumption levels, notably India. While the extent of calorie deprivation in a population was for many years accepted as a fairly reliable indicator of material progress, a number of recent works have called that in to question, particularly in the Indian and Chinese contexts. In India, survey-based indicators have suggested that mean calorie consumption has declined, despite rapid economic growth and monetary poverty reduction. This apparent paradox has raised serious concerns about the usefulness of this indicator. One problem may be sheer measurement error due to the increasing share of food consumed outside the home, for example. Indeed, the FAO undernourishment indicator does not suggest a decline in calorie availability (Headey et al. 2012). Another explanation, however, could be the declining energy requirements of individuals in dynamic economies (Deaton and Drèze 2009; Headey et al. 2012). This can occur because of reduced physiological energy expenditure related to improved infrastructure and mechanization (from increased use of cars, motorbikes, tractors, and piped water, for example) and reduced energy losses through improved health care.

In addition, Jensen and Miller (2010) argue that calorie availability is a particularly poor indicator of trends in food security because of the very low income and own-price elasticities of staple foods (see e.g., Behrman and Deolalikar 1987; Bouis 1994; Bouis and Haddad 1992). For example, aggregate calorie consumption may not rise substantively with income gains because people quickly focus on diversifying their food bundle rather than maximizing total calorie intake, as Bennett’s law implies (Bennett 1941). It is particularly disconcerting that these arguments have been applied to India and China—the two most populous countries in the world—which would appear to warrant low confidence in global hunger estimates.

Finally the responsiveness of calorie availability indicators to shocks is very much open to question. In the 2008 crisis the FAO did not have sufficiently timely data to even simulate the impacts of higher food prices on calorie deprivation, demonstrating that its methods and data are ill-suited to quickly gauging the impacts of shocks. It therefore relied on estimates produced by a production and trade model developed by the United States Department of Agriculture (USDA) for low-income countries only. However, the estimated increases in hunger—with the global estimate eventually exceeding one billion hungry people worldwide in 2009 (FAO 2009)—were later contradicted by the FAO and USDA’s own survey-based estimates of national food availability (Headey 2013).

More generally there are strong theoretical and empirical reasons to believe that calorie availability is a very poor gauge of the impacts of idiosyncratic or covariate shocks. The theoretical arguments are threefold. First, as per Jensen and Miller (2010), when poor people suffer a loss of income, they switch from high-value calorie sources (e.g. meat) to low-value calorie sources (e.g. rice). Thus while total food expenditure may decline significantly, calorie consumption might not. Second, many poor people produce their own food staples, and thus may choose to rely more on own consumption of staples when market prices increase prohibitively. And third, people may sacrifice non-food expenditure to maintain calorie consumption levels. The first two hypotheses suggest that in the face of shocks, calorie consumption might be maintained even as dietary diversity decreases, while the third suggests that the impacts of shocks may partly turn up in nonfood expenditure (implying one should simply measure total income or expenditure).

Reviewing the literature of the 1998 Indonesian financial crisis, we find substantive empirical evidence for these three arguments. This crisis led to a nearly 200 % increase in rice prices, yet all of the existing evidence suggests that rice consumption was maintained, or perhaps even increased slightly (Skoufias 2003). In contrast, the consumption of high-value foods declined precipitously according to most surveys (Block et al. 2004; Hartini et al. 2003b), as did non-food expenditures (Frankenberg et al. 1999)Footnote 8. Interestingly, the FAO data also show no decline in food availability at the aggregate level in Indonesia over the course of the crisis (indeed, this was a criticism of the indicator at the time; see FAO 2003). For Bangladesh there is similar evidence based on high-frequency (monthly) data from the Nutrition Surveillance System (NSS) over 1991–2000 (Torlesse et al. 2003). Specifically, as rice prices fluctuated quite markedly over that period, rice expenditures persisted, whereas non-rice food expenditures varied negatively with rice prices. Whilst these examples are derived from dynamic (panel or pseudo-panel) data on sizeable economic shocks, Jensen and Miller (2010) reached the same conclusion from a randomized experiment on Chinese data. The conclusion from all of this work is that calorie availability is a very poor indicator of the impacts of shocks, except perhaps in situations of the most severe food shortages (i.e. famines).

Monetary poverty indicators

Monetary poverty is obviously a somewhat more indirect indicator of people’s economic access to food. On the other hand, potential substitution between the demand for food and non-food items is an important rationale for viewing poverty indicators as theoretically superior to food or calorie-based indicators. As per the discussion above, higher food prices might not reduce calorie consumption, but could significantly reduce non-food expenditures, thereby raising poverty. For this reason many economists still advocate monetary poverty as an attractive indicator of food insecurity. Absolute poverty lines are also usually linked to minimum calorie consumption requirements, providing a potentially important empirical link to food insecurity. In practice, however, poverty lines often become delinked over time (e.g. Deaton and Drèze 2009 on India). Moreover, poverty indicators have received substantial criticism in recent years on several other counts related to their cross-sectional and inter-temporal validity.

Cross-sectional validity

Most of the criticism in terms of lacking cross-country comparability of monetary poverty indicators has focused on issues of converting household expenditures into a common international currency via purchasing power parity (PPP) conversion (Deaton 2010; Deaton and Dupriez 2011). While an improvement over exchange rates, PPPs are not typically derived from the consumption patterns of poor populations (with the exception of Deaton and Dupriez 2011). A second under-emphasized problem—perhaps related to the fact that little can be done about it—is measurement error. Experimental research on survey design has demonstrated that the choice of survey instrument matters substantively to expenditure-based results (Beegle et al. 2012). But more generally, there are indications of sizeable measurement error in household survey data from some developing countries that is largely related to the limited capacity of the statistical institutions. An indication of this is the extent to which mean per capita consumption from household surveys deviates from mean consumption from national accounts. Clearly, national accounts data are also measured with substantial error, but Table 2 shows a disturbing variation in the ratio of the two indicators. In Indonesia for example, survey-based consumption is just 40 % of national accounts based consumption, but the equivalent ratio for the Democratic Republic of Congo is 169 %. It is far from clear, then, that poverty indicators have sufficient validity in cross-country comparisons.

Table 2 Comparison of consumption estimates from household surveys and national accounts statistics

What about issues of comparability within countries? Here, too, there are substantive issues related to the pricing comparisons across space. For example, there are widespread concerns about the comparability of rural and urban poverty in India (Deaton and Dreze 2002). In some countries there are also much larger gaps in malnutrition than there are in poverty. For example, in Ethiopia the government’s main household economic survey suggested a 5-percentage point rural–urban gap in $1.25-a-day poverty prevalence in 2005, but the gap in child stunting was a much larger, 17 percentage points.Footnote 9 Our conclusion is therefore that, in principle, monetary poverty indicators are promising indicators of food security, but in practice they fall far short of the ideal.

Inter-temporal validity

As we noted above, nationally representative household consumption surveys (such as LSMS-type surveys) are expensive and therefore infrequent. This leaves them little or no potential to examine seasonality.Footnote 10 Gauging the extent of major shocks is difficult with infrequent data and has forced a reliance on simulation approaches to predict the poverty impacts of economic crises using pre-crisis survey data (e.g., Ivanic and Martin 2008; de Hoyos and Medvedev 2009; Ivanic et al. 2011), following the net benefit method developed by Deaton (1989). Essentially this approach relies on the idea that the effects of price changes on disposable income depends on whether a household is a net food producer or net food consumer. While this approach is insightful in some regards, it involves assumptions of questionable validity. For example, it is not obvious that income and price elasticities observed in normal times (typically in cross-sections) apply to the coping behaviors adopted during economic crises. Moreover, the simulation approaches used in the 2008 food crisis predicted rising global poverty (Ivanic and Martin 2008; de Hoyos and Medvedev 2009), whereas historical data suggested sizeable reductions in poverty (World Bank 2012). Thus it is far from obvious that simulation models are good predictors of actual welfare changes.

The expensive solution to this problem would be to conduct household surveys in higher frequency. Certainly, evidence from the Indonesian financial crisis suggests that high-frequency household surveys are a good means of gauging the expenditure impacts of shocks and even of some of the specific coping mechanisms involved. But apart from the sheer financial costs, timing is another issue. For example, some surveys conducted late on in the Indonesian financial crisis found little or no harmful impacts (Ngwenya and Ray 2007), while surveys conducted at the peak of the crisis found more adverse impacts (Frankenberg et al. 1999).

Dietary diversity indicators

Theoretical and empirical evidence suggests that dietary diversity indicators might be surprisingly effective food and nutrition security indicators, for two basic reasons. First, standard definitions of both food and nutrition security emphasize the importance of both macro- and micronutrients (FAO 1996). In principle, dietary diversity should capture consumption of both types of nutrients, or a more balanced diet more generally (Ruel 2003). Second, economic theories of demand—as well as psychological theories such as Maslow’s hierarchy of needs (Maslow 1943)—suggest that individuals will only diversify into higher value micronutrient-rich foods (such as meats, fish, eggs, dairy products, and to a lesser extent fruits and vegetables) when they have satisfied their basic calorie needs. In other words, as poor people become richer, they gravitate away from relatively tasteless staple foods towards micronutrient-rich foods that impart greater taste, and therefore utility (Jensen and Miller 2010).

For these reasons, and because of their relative cost-effectiveness, dietary diversity indicators have become increasingly popular in recent years, particularly in health and nutrition surveys such as the Demographic and Health Surveys (DHS), but also in the World Food Programme’s (WFP) Emergency Food Security Assessments. As a general class, these indicators essentially consist of answers to recall questions about the consumption of a particular food item or groups of items over a recent period ranging typically from 1 day up to 2 weeks. The most common indicators are the food variety score (FVS), the dietary diversity score (DDS), and the food frequency score (FFS). The FVS provides a count of the number of different food items consumed, and the DDS the number of different food groups—usually anywhere between 7 and 15. The FFS is based on recalls that state how often a food group was consumed over the given time period. In some sense, the simple count indicators (FVS and DDS) are special cases of the FFS, so for simplicity we refer to all of them as ‘dietary diversity’ indicators.

One of the most widely used DDS measures at the household level is the 12-scale Household Dietary Diversity Score (HDDS) developed by the Food and Nutrition Technical Assistance (FANTA) Project of the United States Agency of International Development (USAID) (Swindale and Bilinsky 2006a; b). Recently the FAO has promoted a modified, 9-scale version—especially for assessing women’s food and nutrition security—that differs from FANTA’s DDS by dropping the non-staple, micronutrient-poor food groups (such as fats and sugars) and re-grouping vegetables, fruits and animal products according to their bioavailable vitamin A and iron contents (Kennedy et al. 2011).

The WFP’s food consumption score (FCS) is a frequency-weighted dietary diversity score that is calculated from a seven-day household food consumption recall available from WFP’s Comprehensive Food Security and Vulnerability Analysis (CFSVA) surveys. The FCS attaches greater importance to foods deemed most important for nutritional purposes (WFP 2008). The highest weights are attached to meat, fish and milk (4), followed by pulses (3), cereals (2), vegetables and fruits (1) and sugar and oil (0.5). The FCS also omits condiments which are consumed in very small quantities and have no significant beneficial impact on the overall diet (such as tea, coffee, salt, fish powder or very small amounts of milk added to tea or coffee). Since the weights are applied after data collection the final FCS could be altered to vary the emphasis on macro- and micronutrients.

Cross-sectional validity

A number of relatively recent studies have explored the validity of these indicators in a cross-sectional sense. Ruel (2003) provided an extensive review of validation studies of dietary diversity indicators from 1996 to 2002. She generally found positive and fairly strong associations between DDS or FVS and macro- and micronutrient adequacy in developing countries. For example, a study from urban areas in Mali shows correlation coefficients between FVS and nutrient adequacy of 0.33 and between DDS and nutrient adequacy of 0.39 (Hatloy et al. 1998). A similar study for rural areas in Mali shows correlation coefficients of 0.34 and 0.30, respectively (Torheim et al. 2004). Other studies from South Africa and the Philippines report even higher correlation coefficients of up to 0.72 (Kennedy et al. 2007; Steyn et al. 2006).

Other studies have examined statistical relationships between dietary diversity indicators and calorie and food expenditure. A 10-country study by Hoddinott and Yohannes (2002) found that a 1 % increase in household dietary diversity was associated with a 1 % increase in household consumption, a 0.7 % increase in total household calorie consumption, a 0.5 % increase in household calorie consumption from staples, and a 1.4 % increase in household calorie consumption from non-staples. A study by Wiesmann et al. (2009) tested the correlation between the WFP’s food consumption score (FCS) and household calorie consumption in Burundi, Haiti, and tsunami-affected areas of Sri Lanka using seven-day household food consumption data. Unlike most previous studies, this study varied the number of food groups included, the weights attached to food groups, and the degree of truncation of very small consumption quantities. They found that while the original FCS was moderately correlated with calorie consumption (with coefficients of 0.27 in Burundi and 0.44 in Haiti), there was little or no advantage in applying weights or frequencies to particular food groups, though there were some advantages to using more disaggregated food groups, and substantial advantages to excluding small quantities (indeed, correlations with household calorie consumption rose to 0.70 in one instance). Thus, consistent with other studies (e.g. Arimond and Ruel 2006), dietary diversity indicators appear to be generally more nutritionally meaningful—in terms of both macro- and micronutrient adequacy—when omitting extremely small food quantities.

These findings suggest that dietary diversity indicators are relatively valid across households within countries, thus providing some validation for their policy use by WFP, USAID and other institutions. However, one persistent area of concern pertains to cut-off levels that distinguish food-secure from food-insecure populations (the analog to poverty lines or the z-score cut-off lines used in anthropometric indicators). Operational agencies have a strong demand for indicators that reliably impart this information, yet a recent review of dietary diversity indicators conclude that while dietary diversity indicators are indeed strong predictors of anthropometric outcomes and specific nutrient deficiencies, the relationships between dietary diversity scores and nutrient deficiencies varies across countries and contexts (Ruel et al. 2012; Coates et al. 2007a, b). Whilst this does not negate the potential for dietary diversity indicators to serve as internationally indicators of food security or dietary quality in a broad sense, it suggests that one cannot make any strong inferences from dietary diversity indicator about specific micronutrient adequacies.

This distinction is important and too often overlooked. For while dietary diversity indicators may be inconsistent predictors of specific nutrient deficiencies, they may still serve as relatively good indicators of food security. Table 3 provides some evidence in support of this conjecture Specifically, we used the FAO food balance sheets to construct a very simple measure of dietary diversity: the share of calories from non-staple foods, where staple foods consist of cereals and root crops. We then compare how this indicator correlates with other food and nutrition security indicators across countries. Since there is no gold standard for food and nutrition security measurement, we have to interpret the results cautiously, but a stark result from Table 3 is that this exceedingly simple dietary diversity indicator (the share of calories not derived from cereals, roots and tubers) correlates more strongly with anthropometric indicators of malnutrition than the FAO calorie deprivation indicator. For stunting, the correlation with this dietary diversity indicator is a high −0.63, for wasting it is −0.58, and for low BMI it is −0.47. For calorie deprivation the correlation with stunting is reasonably high 0.51, but the correlations with wasting (0.28) and low BMI (0.08) are much lower. Hence, a very simple dietary diversity indicator is a much stronger cross-country predictor of malnutrition outcomes than calorie deprivation.Footnote 11 Some good news on this front is that the FAO’s 2012 State of Food Insecurity now places much more emphasis on a suite of food security indicators, with this simple indicator of dietary diversity included, along with the share of protein derived from animal sourced proteins.Footnote 12

Table 3 Correlations between different indicators of food and nutrition security across countries

Inter-temporal validity

It is well known that dietary diversity improves as economies develop, suggesting that dietary diversity indicators are well suited to tracking slow moving trends in food and nutrition security (Jensen and Miller 2010). However, we know of only one study which provides a specific analysis of the responsiveness of dietary diversity indicators to shocks and seasonality. Brinkman et al. (2010) analyzed several WFP datasets, some of which tracked changes in the FCS over a period of a few months during the 2008 global food price crisis. They found reduced dietary diversity (as measured by the FCS) in most cases. The authors also estimated elasticities of the FCS with respect to local staple food price changes for Haiti, Nepal and Niger and found significant elasticities varying between 0.05 and 0.21. Hence their results seem to show that the FCS displays substantial sensitivity to shocks, though in some cases not as much as one would like.Footnote 13

Additional evidence on dietary diversity comes from the aforementioned analyses of monthly data from NSS surveys in Indonesia and Bangladesh. Over the course of Indonesia’s 1998 financial crisis Bloem et al. (2005) and Block et al. (2004) reported substantially declining dietary diversity in Indonesia, particularly reduced consumption of egg products.Footnote 14 Strikingly, Block et al. (2004) concluded that reduced consumption of micronutrient-rich foods accounted for an 18 point increase in child anemia prevalence. And over a much longer period in Bangladesh (1990–1999), Torlesse et al. (2003) found that dietary diversity fluctuated with rice prices (negatively), which in turn correlated with child underweight prevalence (negatively).

Finally, there are some indications that dietary diversity indicators may also be able to pick up seasonal variations in food consumption. Specifically, Savy et al. (2006) found for women living in the Sahel in Burkina Faso that the DDS was sensitive to seasonal variations in food consumption, while the relationship between women’s body mass index (BMI) and dietary diversity was also seasonal and likely influenced by changing relevance of socio-economic factors and varying workloads.

Nutritional relevance

Dietary diversity indicators can be easily asked about households as well as specific household members. Indeed, the Demographic Health Surveys (DHS) surveys have thus far only collected dietary diversity data for mothers and children (as reported by mothers), though there are a few surveys that pose the question at both the household and individual level.Footnote 15 Ruel’s (2003) review revealed a consistent positive association between dietary diversity indicators and child growth and nutritional status in a number of countries. A subsequent study by Arimond and Ruel (2006) tested whether the diversity of children’s diet could explain their height-for-age z-scores (identifying child stunting) using DHS data from 11 developing countries, and found it to be a significant explanatory variable in all but one country.Footnote 16 In Table 1 we also saw that dietary diversity indicators have significant associations with all three child anthropometric indicators. Thus dietary diversity indicators seem to be quite a nutrition-relevant indicator of food security.

Subjective indicators

A final class of indicators is based on subjective responses to food security questions. An important definitional issue is that while most survey-based food security indicators involve self-reporting (e.g. food expenditure, dietary diversity), subjective indicators ask for a more reflective thought process. They are also typically defined by raising potentially emotive subjects, such as hunger, anxiety or general wellbeing. In raising these subjects, subjective questions automatically generate an important tradeoff: such feelings are obviously important from a welfare point of view, but emotive subjects can induce response biases (and in unpredictable directions).

At one extreme are very simple dichotomous indicators, such as the Gallup World Poll indicator used by Headey (2013), which asked whether respondents had experienced problems affording food over the previous 12 months. Other surveys including the Afrobarometer survey, WFP’s CFSVA survey, the World Bank’s Core Welfare Indicator Questionnaire survey, and some household consumption and expenditure surveys contain questions about the frequency of food affordability problems or experiences of hunger in the last 12 months. At the other extreme of sophistication is the Household Food Insecurity and Access Scale (HFIAS) developed by USAID’s FANTA project. The HFIAS is an adaptation of the Household Food Security Survey Module scale, used by USDA and other agencies to measure food access in the United States. Respondents are asked to assess the frequency of different types of food insecurity over a four-week recall period, including experiences related to anxiety about household food access, satisfaction of food preferences, food availability and diversity, and signs of food shortages in daily life. The answers to the nine questions yield a rank on the HFIAS which captures the full breadth of insecurity from the purely psychological to more physical feelings of hunger (Coates et al. 2007a, b).

Cross-sectional validity

As we noted above, subjective indicators possess some unique advantages and disadvantages. Advantageously, subjective indicators can capture psychological dimensions of food insecurity. While we would argue that the nutritional implications of food insecurity should be paramount in underdeveloped settings, psychological dimensions are often of inherent interest, since perceptions matter in their own right. Subjective data can also be useful for gauging expectations, such as expectations about inflation, food stores or upcoming harvests (based on forward looking questions about food security, for exampleFootnote 17). A second advantage is the relatively low cost of subjective data, especially compared to time consuming expenditure data required to compute poverty and calorie consumption estimates. A third advantage is that subjective recall questions can be used to capture seasonality, such as through the “hunger gap” question, which asks about the number of months or weeks of hunger experienced in the last year.

These advantages have prompted substantial enthusiasm in the nutrition community (particularly for the HFIAS), but there are thus far surprisingly few critiques of subjective indicators in this literature, especially by economists, who are traditionally wary of subjective indicators. While there is a sizeable economic literature on the weaknesses of subjective indicators, very little of it applies to food security questions specifically. The literature that does exist, however, raises some important concerns. Deaton (2011) and Headey (2013) found that the ordering of questions significantly affected responses in two different Gallup surveys. Deaton (2011) found that the bias induced by question ordering in a high frequency Gallup poll of US citizens had a larger influence of self-reported well-being than the recent financial crisis. Headey (2013) found that question ordering appeared to have a large effect on self-reported food insecurity in China. His paper also raised concerns about lack of comparability of self-assessed food insecurity across wealth and education groups because of different individual dietary standards or reference points (the lack of a common reference frame is, in fact, a longstanding concern that economists have with subjective indicators). In particular, he found that self-reported food insecurity was surprisingly high in some middle-income countries with exceptionally high rates of educational attainment (including Sri Lanka and a number of Central Asian countries).

Another issue related to cross-section validity is cross-cultural inconsistency. Deitchler et al. (2010) tested the cross-cultural comparability of the HFIAS scale in six countries and found that only three of the nine questions in the HFIAS demonstrated adequate cross-country comparability. Specifically, these were the last three questions pertaining to experiences of hunger and their physical consequences: “No food to eat of any kind”; “go to sleep hungry at night”, and “go a whole day and night without eating”.Footnote 18 In retrospect, perhaps this result is not so surprising, since the meanings of hunger and the physical consequences of hunger are less open to interpretation than terms included in the first six questions of the HFIAS, such as “worry”, “enough food”, “preferred food”, “variety of foods”, and so on.Footnote 19 It seems likely that these terms don’t elicit a sufficiently clear reference frame. For example, “variety” for a poor person may involve eating animal-sourced products once a month, but for a wealthy person it may involve eating these products once a day.

While the Deitchler et al. (2010) study is certainly informative, it makes little mention of other possible sources of response bias. Possible sources of underestimation of food insecurity include feelings of shame associated with admitting hunger, or fear (particularly in authoritarian regimes where even implicit criticisms of government policies are not tolerated). And possible sources of overestimation of food insecurity include increasing scope of public transfers (food aid, social safety nets, or other welfare programs), which foster material incentives for individuals to classify themselves as food insecure.

These rather negative conclusions from the food security literature on subjective indicators prompted us to look at how subjective indicators correlate with other food security and nutrition indicators, using household survey data from Malawi, Cambodia, and Ethiopia. The results for Malawi were previously reported in Table 1. For Cambodia and Ethiopia we report results in the text below. For Malawi we correlated a binary variable of subjective household food adequacy with household expenditure and calorie consumption per capita (in logarithmic terms) as well as dietary diversity indicators and child anthropometrics. For Cambodia we used the Cambodia Socio-Economic Survey 2009 and calculated similar parings for a binary household food adequacy variable, as well as a “hunger gap” indicator (measured on a weekly basis). For Ethiopia we used the “hunger gap” indicator (measured on a monthly basis) from the Welfare Monitoring Survey 2004/05 and correlated it with household food expenditure and calorie consumption per capita. It is important to note that while the subjective questions in Malawi and Cambodia refer to experiences over the past month, the question in the Ethiopia survey refers to the past year.

For Malawi and Cambodia, the strongest correlation of subjective household food adequacy is with household expenditure (with coefficients of 0.25 and 0.22, respectively), followed by the DDS (with coefficients of 0.21 and 0.16, respectively). However, the correlation of subjective food security indicators with calorie consumption is low in Malawi (with a coefficient of 0.10) and even statistically insignificant for Ethiopia.Footnote 20 For Malawi and Cambodia the correlations of the subjective household food adequacy indicator with anthropometric indicators are also low (less than 0.11), but statistically significant (with the exception of weight-for-height z-scores for Cambodia), and no lower than the correlations that other food security indicators share with anthropometric indicators. For Cambodia the “hunger gap” indicator yields slightly lower correlation coefficients with all quantitative food and nutrition security indicators than the binary household food adequacy indicator. In Ethiopia the correlation between the hunger gap indicator and household food expenditure is much lower (with a coefficient of −0.04). Overall, then, the results are somewhat mixed: there are some reasonably strong associations between subjective indicators and other food and nutrition security indicators, but many correlation pairings are quite weak. The results also seem to suggest that hunger gap indicators perform substantially worse than the food adequacy questions.

Our overall conclusion is that subjective indicators have some potential to measure meaningful information on food security, particularly on extreme forms of hunger and on expectations of food insecurity. There may also be substantial means of improving their measurement by eliciting more precisely defined reference frames.Footnote 21 But further validation and consistency checks should certainly be conducted before these indicators can be classified as achieving adequate cross-sectional validity.

Inter-temporal validity

We know of very few analyses that test the performance of subjective indicators in gauging the impacts of shocks or seasonal shortfall. Using the Gallup World Poll indicator of “problems affording food”, Headey (2013) conducted some basic tests to see whether within-country changes in this indicator were significantly explained by real per capita GDP growth. He found a highly significant and negative effect of economic growth on changes in subjective food insecurity, but he also noted the low explanatory power of the regression, suggesting that measurement error was sizeable. Indeed, many countries saw implausibly large changes in subjective food insecurity, implying that either the specific question or the survey itself was inducing measurement error or response biases.

More generally, there is a significant problem with interpreting changes in subjective indicators. For example, Helen Keller International (HKI) uses the HFIAS in combination with anthropometric indicators for women and children in the Food Security and Nutrition Surveillance Project in Bangladesh. The HFIAS suggested that household food insecurity increased by a remarkable 31% age points (or 69 %) between the first quarter of 2010 and the first quarter of 2011 (from 45.1 % to 76.1 %), possibly as a result of food price surges (HKI 2011). But, is it plausible that the latent variable—food insecurity—really increased by such a large amount? Objective indicators suggest otherwise. The proportion of non-pregnant mothers in reproductive age with a BMI below 18.5 increased by almost five percentage points (or by 21 %, from 22.7 % to 27.5 %). The prevalence of acute malnutrition (measured using weight-for-height z-scores) among preschool children rose by almost three percentage points (from 7.6 % to 10.3 %). In contrast the prevalence of chronic child malnutrition (measured using height-for-age z-scores) declined by more than 3 % (or by 7 %, from 44.7 % to 41.4 %). Clearly, anthropometric outcomes are lagging indicators, since households and individuals do their best to protect food consumption. Even so, the fact that the subjective indicator increased by a factor of 6 relative to maternal BMI clearly raises a problem: how does one interpret quantitative changes in subjective indicators? The uncertainty surrounding the meaning of such changes also makes it very difficult for operational agencies to informatively respond to this information. If subjective food insecurity increases by 69 %, should the WFP and other humanitarian agencies also raise food aid by 69 %?

Nutritional relevance

In principle, an attractive feature of subjective indicators is that they can be asked of individuals as well as households, although in practice questions asked about the household are more common (as in the HFIAS and Gallup questions). One underexplored issue is whether there may be response biases pertaining to individual versus household information. For example, questions about the food security of young children need to be asked of parents, who may be unwilling to admit that their children are inadequately fed. Similarly, previous research has shown that men and women within the same household can give very different answers to common questions about financial security, suggesting that gender biases could constitute an important issue for individual level subjective questions (Breunig et al. 2007).

Are subjective indicators good predictors of malnutrition? There seems to be mixed evidence on this front. A recent paper by Kac et al. (2012) found that severe food insecurity—as measured by the HFIAS – was in fact a strong predictor of overweight prevalence in female adolescents aged 15–19 years. No less disturbing, a study of one particular district of Nepal found no association between HFIAS-based food security and child malnutrition indicators (Osei, et al. 2010). In contrast, a recent study in a rural area of Tanzania did find significant associations between the HFIAS and nutrition outcome (Cordeiro, et al. 2012). Another very recent study of the nutritional performance of HFIAS indicators in rural areas of Bangladesh, Ethiopia and Vietnam by Ali et al. (2013) found much stronger correlations (using a common questionnaire). In all three countries higher degrees of food insecurity predicted significantly higher rates of stunting and underweight prevalence, though only in Bangladesh did the authors find that wasting was explained by severe food insecurity. These multi-country results tend to suggest that the HFIAS does impart nutrition-relevant information in cross-sectional comparisons in relatively poor countries, but the association with obesity in Brazil again raises concerns about cross-country comparability of subjective indicators.

Conclusions and implications for improving food security measurement systems

If measurement really does drive diagnosis and response, then mismeasurement of food insecurity presumably drives misdiagnosis and inappropriate responses (or no response at all). Indeed, there are good grounds to believe that the costs of mismeasuring food insecurity are non-trivial. Food security indicators clearly influence the allocation of humanitarian assistance and national welfare programs, yet different indicators give very different messages about which countries are most food insecure. The FAO undernourishment indicator tells us that the Democratic Republic of Congo is the most food insecure country in the world (despite abundant rainfall and no history of drought). Some poverty indicators suggest Ethiopia is one of the poorest countries in the world, but other poverty indicators tell that poverty in Ethiopia is far below many of its neighbors (Alkire and Santos 2010). South Asia’s enigmatically high rates of malnutrition are often attributed to food insecurity (Deaton and Drèze 2009), but could well be explained by health factors (Headey et al. 2012; Spears 2013). And, as we noted above, there are persistent significant controversies regarding the welfare impacts of the global food crisis.

Summary and indicator scoring

Despite widespread dissatisfaction with common food security indicators and measurement systems, very few studies have attempted to rigorously justify improvements. In this paper our first objective was to show that an ideal food security indicator—or suite of indicators—must satisfy a range of key criteria. Much previous research has focused on the issue of cross-sectional validity, through correlation analysis. To this we added increased emphasis on inter-temporal dimensions beyond the ability to track slow-moving trends. This includes the capacity to gauge the impacts of major shocks, as well as seasonal effects. In addition, a large body of literature now persuasively demonstrates the instrumental importance of early childhood and maternal nutrition, which should elevate nutritional relevance to being a very important criterion for effective food security measurement.

Table 4 summarizes our findings on the usefulness of the four types of food security indicators, by each of these criteria. We also assign scores for whether we deem the indicator useful (2 points), potentially useful (1 point), or of limited use (0 points) for each criterion, and we take the sum of these scores as rough indications of how close the indicator type is to the ideal. Admittedly, one could attach more or less weight to different dimensions (our equal weighting is arbitrary, but at least transparent).

Table 4 Usefulness of food and nutrition indicators in gauging the impacts of shocks—a score sheet

Table 4 illustrates why we come down quite heavily in favor of dietary diversity as a class of indicator with considerable potential. In fact, dietary diversity indicators are the only class of indicators that have at least some usefulness according to each criterion. They are nutrition relevant in that they capture both macro and micronutrient adequacy at least in a general way, in that they are measurable at the individual level, and in that they correlate well with nutritional outcomes. They appear to have considerable potential for gauging the impacts of shocks and seasonality, not only because dietary diversity is sensitive to shocks, but also because they are cheap enough to be collected at high frequency. And, within countries dietary diversity increases with income in a more linear manner than calorie consumption alone. It is perhaps less clear that dietary diversity can be easily measured across countries, especially once thresholds are used to define inadequate diets. On that front, however, we still suspect more work is needed, particularly since we found that even a very simple indicator of dietary diversity calculated from the FAO food balance sheets performed better than other cross-country indicators in terms of cross-correlations with other food security indicators and anthropometric indicators of child malnutrition. So dietary diversity indicators have substantial scope to add more value to food security measurement, especially if they can be refined and improved, rendered more comparable across populations, and measured more frequently over time.

We are much more skeptical about subjective/experiential indicators. Increasingly popular indicators—such as the HFIAS scale—admittedly share with dietary diversity indicators some desirable properties. They include the potential to focus on individuals, to pose questions on both total food availability and dietary diversity, and (because of cost-effectiveness) to be conducted at sufficient frequency in order to be useful in picking up shocks and exploring seasonality issues. However, the basic statistical validity and consistency of subjective indicators has yet to be convincingly established. Some existing evidence suggests that they lack validity (including cross-country comparability) and that they are highly sensitive to framing effects, question ordering and other response biases (Deaton 2011; Deitchler et al. 2010; Headey 2013). Future research should therefore look at test-retest reliability as well and explore the highly problematic issue of response biases. With the rapid expansion of social safety nets in the developing world, response biases could be even more problematic in future as poor populations face ever-stronger incentives to exaggerate their food insecurity.

Finally, we also rank calorie deprivation and poverty indictors quite poorly. Within countries, calorie deprivation and poverty indicators are measured from the same infrequent datasets and hence suffer many of the same problems, including lack of individual-level data and a limited capacity to assess shocks and seasonality. Even for cross-country comparisons and trends there are important limitations, especially with the much criticized FAO undernourishment indicator We rank poverty indicators slightly higher than calorie deprivation indicators, but we acknowledge that this is partly a matter of judgment given controversies over the international measurement of poverty, as well as the questionable nutritional relevance of poverty indicators.

Implications for improving food security measurement

A further implication of Table 4 and the conclusions drawn in this paper is that the greatest deficiencies in existing approaches to food security measurement is their incapacity to gauge shocks and in their basic nutritional relevance. It therefore behooves us to at least briefly discuss how these gaps could be filled.

Improving the nutritional relevance of food security measurement surely means using indicators which capture both macro- and micronutrient consumption, which can be measured at the individual level, and which give some sense of acute food insecurity (such as seasonal shortfalls or consumption shocks). Dietary diversity indicators seem to be useful to some extent in all three of these dimensions, at least if they can be measured with sufficient frequency. The nutritional relevance of food security indicators can also be maximized by co-measuring food security and nutrition indicators. This is fast becoming the practice with the World Bank’s Living Standards Measurement Surveys, including the new Integrated Surveys on Agriculture (ISA) project. The USAID-funded Demographic Health Surveys, the UN’s Multiple Indicator Clusters Surveys, and various WFP surveys also collect standard nutrition indicators and some kind of dietary diversity indicators.

For cross-country purposes, we would argue that these organizations should consider measuring dietary diversity in a common way and in a manner that maximizes cross-country comparability. One barrier to doing so has been the argument that common dietary diversity thresholds (cut-offs) do not impart consistent information on specific nutrient deficiencies (Ruel et al. 2012). In our view this does not necessarily invalidate dietary diversity indicators as food security indicators, or even as broad proxies for dietary quality: lack of dietary diversity is still indicative of high rates of dependence on staple foods and poor access to more nutrient-dense foods. How this lack of access translates into specific nutrient densities is evidently very context specific, but can reasonably be regarded as a somewhat separate issue. Moreover, any problem with cross-country comparability of a specific indicator must be regarded in relative terms. Equally (if not more) severe problems plague the estimates of purchasing power parities (i.e. estimating cross-country price differences across very different consumption bundles), which are absolutely essential to measuring poverty across countries (Deaton and Dupriez 2011).Footnote 22 Yet these obstacles have not stopped efforts in poverty measurement, nor should it prevent attempts to improve the measurement of dietary quality across countries. Hence a practical suggestion is for these key agencies to develop common and internationally comparable dietary quality indicators from household survey data (ideally, separate indicators for children, women and men) and to further coordinate food and nutrition security surveys so as to maximize country coverage.

The other major knowledge gap—the impacts of shocks and seasonal shortfalls on food security—will be more costly to fill, as it ultimately requires additional surveys carried out on a high-frequency basis along the lines of the NSS surveys conducted in Indonesia and Bangladesh.Footnote 23 While these kinds of surveys yield very important insights into the dynamics of food insecurity, and are certainly useful for food security monitoring (Bloem et al. 2003), they are also costly to implement. Hence we would argue that they should be prioritized in countries that are highly exposed to shocks. The extent of exposure could be measured by dependence on humanitarian assistance, by exposure to natural disasters and seasonal shortfalls in general, and by baseline levels of chronic food and nutrition insecurity. These criteria would help targeting resources to the countries or regions where high-frequency food and nutrition security measurement is most needed.Footnote 24 The WFP and other humanitarian and development agencies would be natural proponents for such surveys; indeed, the WFP already uses sentinel site surveys in a limited number of countries.

While the expanded use of high-frequency surveys would involve substantial costs, we have several reasons for arguing that the benefits of this measurement system would ultimately exceed the costs by a healthy margin. First, information communication technologies (ICTs) will surely have a substantive effect in reducing the costs of data collection and in improving the timeliness of their dissemination. Second, climate change research suggests that many already vulnerable regions could be much more exposed to these shocks in the future, contributing to rising costs of inaction (IPCC 2012). For example, recent climate research in the Horn of Africa suggest droughts have already become more common on the back of a much warmer India Ocean, and will continue to do so in the future (Funk et al. 2008). Yet the data that feed into the monitoring of recurrent and increasingly severe droughts in that region—including the exceptionally severe drought of 2011—are infrequently collected and more conjectural than they need to be.Footnote 25 Highly vulnerable regions like the Horn of Africa, the Sahel, and South Asia—regions to which many millions of dollars of humanitarian assistance are directed on annual basis— surely merit better monitoring of food and nutrition security.