Keywords

Contingent valuation is a stated preference method and a survey-based approach to nonmarket valuation. A contingent-valuation question carefully describes a stylized market to elicit information on the maximum a person would pay (or accept) for a good or service when market data are not available. While controversial, as will be discussed below, contingent valuation and choice experiments (Chap. 5)—a close cousin in the stated preference family of valuation methods—are arguably the most commonly used nonmarket valuation methods.

An early contingent-valuation study was conducted by Davis (1963) to estimate the value of big game hunting in Maine. About a decade later, Cicchetti and Smith (1973) and Hammack and Brown (1974) applied contingent valuation to wilderness recreation and waterfowl hunting. Simultaneously, an application to valuing visibility in the Four Corners region of the Southwestern United States represented a turning point after which contingent valuation gained recognition as a methodology for estimating values for public goods (Randall et al. 1974). The types of environmental applications have continually expanded, and studies have spread to health, cultural, and other types of applications when market data are not available to support decision-making .

Results from early applications of contingent valuation met with skepticism and criticism. One of the more notorious comments was expressed by Scott (1965), who referred to contingent valuation as a “short cut” and concluded, “ask a hypothetical question and you get a hypothetical answer” (p. 37). The contingent-valuation question is hypothetical in the sense that people do not actually make a monetary payment. Some of this criticism was deflected by Bishop and Heberlein’s (1979) landmark validity study in which they compared welfare estimates for goose hunting from actual cash transactions, contingent valuation, and travel cost (Chap. 6). This study showed that the contingent-valuation estimate of willingness to pay (WTP) was of similar magnitude to estimates of WTP provided by a travel cost model and the cash transactions. Comparison of the contingent values to the travel cost values is a test of convergent validity, and comparison with cash transactions is a test of criterion validity (Carmines and Zeller 1979).

A workshop sponsored by the U.S. Environmental Protection Agency was the first attempt to synthesize what was known about contingent valuation (Cummings et al. 1986). The notable outcome of this “state-of-the-art assessment” was a set of reference operating conditions for conducting a credible contingent-valuation study. The reference operating conditions motivated much research to evaluate the validity of contingent valuation and to probe the limits of the types of applications where contingent valuation could provide credible welfare estimates.

Perhaps the most substantive contribution to the contingent-valuation literature was made with a book by Mitchell and Carson (1989) that presented detailed discussion on designing a contingent-valuation study. The book provided the broad overview that novices require when conducting their first contingent-valuation study, as well as prescriptive recommendations that set off a whole new wave of validity research. Mitchell and Carson fundamentally shifted the research focus to one that considered the details of study design. Validity, rather than being a global, all-or-nothing criterion, was now viewed as a function of specific aspects of study design; specific design elements were found to create or reduce bias in estimates of central tendency and increase or decrease the dispersion around estimates of central tendency.

Through the first 25 or more years of the use of contingent valuation, critiques of this methodology seemed to ebb and flow without a specific focal point of attack. This all changed when contingent-valuation estimates began to be used in legal cases as the basis of damage payments by parties responsible for large-scale pollution under the Comprehensive Environmental Response , Compensation and Liability Act (CERCLA) of 1980 (Ward and Duffield 1992). The controversy became particularly heated after the settlement of the Natural Resources Damage claim for the Exxon Valdez oil spill. Exxon supported the publication of a book that critiqued the fundamental premises of contingent valuation (Hausman 1993), and the National Oceanic and Atmospheric Administration (NOAA) responded with a blue ribbon panel to evaluate the credibility of using contingent valuation to estimate passive-use values (Arrow et al. 1993). Passive-use values are those values held by individuals for the item being valued itself and not for any personal use of the item (see Chap. 2, Eq. (2.26)) and are also commonly referred to as nonuse values .Footnote 1

The NOAA panel provided specific recommendations on how a contingent-valuation study should be designed and conducted to develop “reliable” estimates of passive-use values. The panel’s recommendations set off another wave of research designed to investigate the credibility of contingent valuation, particularly in the context of applying the NOAA Panel’s recommendations to the estimation use, passive-use values , and total values.Footnote 2

From 1993 when the NOAA Panel report was published through 2003 when the first edition of this chapter was published, the contingent-valuation literature was populated with methodological studies designed to better understand the relative strengths and weaknesses of this valuation method in the context of the Panel’s recommendations. Moving forward from 2003 to the present, a review of the contingent-valuation literature still reveals methodological studies, but there are many more applications to a variety of issues around the world. Johnston et al. (2017) present contemporary guidance for stated-preference studies, including contingent valuation, that addresses all common applications, not just the estimation of passive use values for litigation.

Recently, the Journal of Economic Perspectives published a special section on contingent valuation as an update on what has been learned in the past 20 years. Kling et al. (2012) concluded that a “carefully constructed number(s) based on stated preference analysis is now likely to be more useful than no number,” and “the remarkable progress that stated preference researchers have made … serves as a model for the evaluation of other co-critical techniques” (p. 23). In contrast, Hausman (2012) continued to focus on the issues of hypothetical bias, disparity between willingness to pay and willingness to accept, and embedding/scope problems.Footnote 3 An element that is not clearly addressed in either of these papers is the standard that should be used to judge the credibility of contingent-valuation estimates. Kling et al. laid out four criteria where the strongest—criterion validity—asks if “the estimate generated by stated preference methods [is] the same as a willingness-to-pay value that would be generated if real payment was made” (p. 14). The question that is not well addressed is whether real payments are the conceptually correct counterfactual to judge the credibility of contingent-valuation estimates and whether market contexts can affect real payments as survey design features might affect contingent-valuation estimates. Chapter 4 does not delve into these issues of validity, but some of these issues are covered in Chap. 12.

This chapter provides an overview of the steps in designing and implementing a contingent-valuation study. This chapter does cover the major considerations in designing and implementing a contingent-valuation study. Careful attention to the design features discussed in this chapter can enhance the overall credibility of any contingent-valuation study.

Valuation of groundwater quality is used as an example to help explain the steps involved in designing and conducting a contingent-valuation study. As contingent valuation is applied to a wide array of issues, the design features presented in this chapter are broadly appropriate for all applications. Methodological citations will be drawn from all types of applications published in the peer-reviewed literature , not just groundwater valuation applications. Design features unique to certain applications are not discussed here.

4.1 Steps in Conducting a Contingent-Valuation Study

Most of the action in designing a contingent-valuation study occurs in the development of the survey instrument in terms of scenario wording, choice of a valuation question, and bid selection where appropriate (Table 4.1). The valuation scenario in the survey is the foundation for value estimation, and the auxiliary questions in the survey provide respondent-specific data for statistical analyses. As choices in the survey design and data analysis stages of a study can affect welfare estimates, careful design of a contingent-valuation survey and careful analysis of the resultant data are crucial to the estimation of credible welfare estimates.

Table 4.1 Steps in conducting a contingent-valuation study

4.1.1 Identifying the Change in Quantity or Quality to Be Valued

The first step in conducting a contingent-valuation study involves developing a theoretical model of the value(s) to be estimated, which is based on the difference between the baseline utility with the current environmental condition and the utility with the new environmental condition (Step 1 in Table 4.1).Footnote 4 Following the notation protocol established in Chap. 2 (see Eq. 4.2), the value for a program to protect groundwater from contamination can be defined as

$$v\left( {P^{0} ,Q^{1} ,y} \right) = v\left( {P^{0} ,Q^{0} ,y - {\text{WTP}}} \right),$$
(4.1)

where v(·) is indirect utility function, P is the price of drinking water obtained from groundwater, y is income, WTP is a Hicksian measure of value, and Q is groundwater quality (\(Q^{0}\) denotes current water quality and \(Q^{1}\) denotes degraded quality if the protection program is not implemented).Footnote 5,Footnote 6,Footnote 7 The value of interest is that of protecting groundwater from contamination (Q 0 > Q 1). Note that P is assumed to remain constant. Assuming the change does not affect drinking water (e.g., water utility purification protects drinking water supplies), the value that would be estimated here is a passive-use value associated with a change in groundwater quality (∆Q). The purpose here is not to delve into details of theoretical definitions of changes in welfare, which is the domain of Chap. 2, but to discuss aspects of the value definition that must be considered in the design of any contingent-valuation study.

The theoretical definition of value is fundamental to three key components of any contingent-valuation study. First, the definition of value plays a central role in developing the description of the conditions with and without the item to be valued (Step 5.1 in Table 4.1).Footnote 8 Second, the definition frames the statistical analysis of the contingent-valuation responses (Step 9). Third, having a theoretical definition allows for clear interpretations of value estimates in specific decision-making contexts (Step 10). Decision-making is used in a very general sense to refer to any condition that can change utility and for which welfare estimates are required. This broad definition includes any public or private decision-making process where nonmarket values would be used.

The difficult part of Step 1 is identifying the physical change and how it affects utility. Economists depend on physical and biological information to establish status quo conditions (Q 1) and predictions of future conditions to be avoided (Q 0) or in other cases attained. This information may not be known with certainty, and applied valuation further requires information about this uncertainty. For example, a definition of value for a policy that reduces the probability that groundwater will become contaminated is as follows:

$$\pi_{1} v\left( {P^{0} ,Q^{1} ,y - op} \right) + \left( {1 - \pi_{1} } \right)v\left( {P^{0} ,Q^{0} ,y - op} \right) = \pi_{0} v\left( {P^{0} ,Q^{1} ,y} \right) + \left( {1 - \pi_{0} } \right)v\left( {P^{0} ,Q^{0} ,y} \right),$$
(4.2)

where π is the probability of contamination, π 0 > π 1, and op is willingness to pay (option price as defined in Eq. 2.32, Chap. 2) to reduce the probability of contamination (Bishop 1982). Here, π 0 is the probability that the current water quality, Q 1, will not be degraded in the absence of a policy, and π 1 is the enhanced probability that Q 1 will not be degraded if the policy is enacted. Thus, op is maximum WTP to increase the probability that the current water quality will be maintained. Applying contingent valuation to conditions that change the probability of an outcome is a common practice.

The fundamental information transmitted in the survey is the description of the change to be valued. For example, if a policy will protect groundwater from contamination, the survey will explain current conditions in the aquifer (Q 1) and what would happen to quality if the protection policy were not implemented (Q 0).

Describing changes in quality in contingent-valuation surveys is an area where there has been considerable progress in recent years in two dimensions. First, there have been advances in the information on environmental quality available to study designers. These advances have evolved from interdisciplinary research projects where economists are interacting with physical and biological scientists at very early stages in the research process (Johnston et al. 2012), the expanding availability of GIS data (geographical information systems; Bateman et al. 2002), and the use of simulated images to portray new conditions (Madureira et al. 2011). The second evolution is the use of Internet surveys where the presentation of information can be carefully presented to respondents, such as allowing respondents to reference information as they complete the survey and limiting respondent’s ability to read ahead or change answers as they complete the survey (Lindhjem and Navrud 2011). These advances have improved the realism of the information in contingent-valuation scenarios to respondents and helps to more carefully link value estimates to decision-making .

Sometimes, in the absence of clear information on the change to be valued, survey designers will focus on the action that affects the change in services people enjoy. With only a description of the action and not a description of the change, survey respondents are left to make two assumptions: what will be the change to value and how this change will affect the services they receive. This issue has not been directly and extensively addressed in the contingent-valuation literature and deserves more consideration. The key design consideration is to avoid descriptions that are vague or confusing. In the absence of clearly understood descriptions of the change to be valued, survey respondents are left to their own assumptions regarding what the change will accomplish. This is a problem because different respondents could make different assumptions about what to value, and those assumptions might not be consistent with the change the study is designed to value.

It could be the case that a contingent-valuation study is being conducted in advance of an anticipated decision, and the details of the action and the consequent effects are not known. Contingent-valuation studies are often designed to estimate an array of values that represent plausible conditions that might occur when the physical information becomes known or the actual decision is finalized. Thus, the design of a contingent-valuation study proceeds using a “best knowledge” of the actual change that will be implemented and of the ultimate effects with and without the action. Here is where a careful theoretical definition of value estimated is crucial; the definition allows an ex post interpretation of whether the value estimate matches the realized change it is intended to value.

In addressing this step, it is important to recognize that there are two types of contingent-valuation studies . Many studies in the peer-reviewed literature focus on methodological contributions and are not designed to address a specific decision-making issue (e.g., Murphy et al. 2010), while other studies have been designed to address a specific decision-making issue (Whittington and Pagiola 2012). This difference is crucial because methodological studies can use whatever definition of value is convenient. In a specific decision-making context, however, the value definition must be linked to the specific change in utility that will occur if the proposed change is implemented. Methodological studies still require theoretical definitions of estimated values to guide data analyses. Theoretical definitions of value can also allow estimates from methodological studies to be useful for benefit transfers (see Chap. 11).

The discussion in this section has focused on Hicksian measures of value, which is the theoretical concept typically measured in contingent-valuations studies. Step 1 involves an explanation of the theoretical definition of value presented within the survey instrument and the change that respondents are asked to value.

4.1.2 Identify Whose Values Are to Be Estimated

Once the change to be valued has been specified, the affected population can be identified (Step 2) . This step involves identifying the affected population and selecting a sampling procedure (see Chap. 3, Fig. 3.1). This information is important in selecting a sample frame so that the contingent-valuation survey will be administered to a representative sample of affected individuals. It is also important to determine if the contingent-valuation study will estimate values on a per-capita or per-household basis. It is necessary to know the size of the affected population to expand the per-unit values to a population value. The affected population and unit of measurement could indicate a desirable mode of data collection or modes of data collection that are acceptable for the particular application.

In the example where groundwater quality is threatened, the affected population might constitutes those individuals who use the aquifer as a source of drinking water and those who hold passive-use values for groundwater in the aquifer. If everyone in a community obtains water for household usage that is drawn from the aquifer, it is fairly clear that the affected population includes everyone in the community. However, the affected population might also include people who live outside of the city and hold passive-use values. In this case it may be more difficult to identify the population holding passive-use values, and a uniform list might not be available to draw a sample from.

Most studies use geopolitical boundaries such as a city, county, or state in the United States to define the relevant study population. In the above example, the study population might be extended from the specific city to the county where the city is located if the aquifer of interest extends beyond the city boundary. The literature provides little guidance for selecting study populations (those who are affected by the change), but two points of consideration are useful. First, geopolitical boundaries are useful for identifying locations affected by the change being valued and those who will pay for the change to be implemented. Second, a spatially referenced sample will allow for an investigation of how value estimates change with distance from the affected area. If values decrease or increase with distance, this can affect the magnitude of aggregate welfare calculations (Bateman et al. 2006; Hanley et al. 2003; Mansfield et al. 2012).

Chapter 3 distinguishes between the affected population and the sample frame (those who are eligible for selection in the sample). Sample frame is important because this determines whether the selected sample is representative of the affected population.

Another issue is the unit of measurement for values. Some contingent-valuation surveys elicit values for individuals (e.g., Bateman et al. 1995), and others elicit household values (e.g., Poe and Bishop 2001). It is important that the framing of contingent-valuation questions makes clear whether a household valuation or an individual valuation is being requested. Mitchell and Carson (1989) asserted that “payments for most pure public goods are made at the household level. … When this is the case, the appropriate sampling procedure is to allow any adult who claims to be a household head to be a spokesperson for the household—the current practice of the U.S. Census Bureau ” (pp. 265-266). (Mitchell and Carson cited Becker’s (1981) Treatise on the Family in support of this assertion.)

However, the issue is more complicated. Quiggin (1998) argued that only when intra-household altruism does not exist or is paternalistic will the sum of individual values equal household values, yielding the same aggregate measure of welfare change. In the absence of these conditions, he argued that sampling and aggregating over individuals will yield larger measures of aggregate values than sampling and aggregating over households. In contrast, Munro (2005) argued that individual and household values will be the same only if household members pool their incomes. Supporting these potential limitations, Bateman and Munro (2009) and Lindhjem and Navrud (2012) found significant differences between individual and household values. These results would appear to violate the unitary model where households act like a person and there is only one utility function. However, Chiappori and Ekeland’s (2009) collective model assumed income sharing among household members where it is possible to isolate individual welfare measures based on an allocation of resources to household members. In this theoretical framework, the potential contradictions mentioned above may not, in fact, be contradictions.

The lesson here is that restrictions on household member utility functions and interactions are necessary to secure equality between the sum of individual household member values and household values, and these assumptions might not always be testable. Further, it is not appropriate to assume that household values are a conservative approach. Thus, when choosing individual or household sampling units, it is important to ask several questions:

  • Do households make individual or group decisions?

  • If they make group decisions, do households pool their income?

  • If they make group decisions, do decision-makers adequately account for the preferences of all household members?

  • Do some households make group decisions and others make individual decisions?

Answers to these questions have important implications for individual and aggregate welfare estimation. A decision on individual versus household units of measurement should be based on the specific decision-making context being valued, and this selection influences what payment vehicle is selected (see Sect. 4.1.5.3), pretesting the survey instrument, the availability of a sampling frame, and other dimensions of the study design. A recent study suggests that people in a household act similarly, so this might not be an issue if the finding can be generalized to other applications (Adamowicz et al. 2014).

The bottom line for both sample frame and sampling units is that the sample frame should be known, and each unit in the sample should have a known probability of selection if value estimates are to be used to support decision-making.Footnote 9 Sample frame and unit of observation decisions should be clearly documented and any limitations or uncertainties noted in reporting study results.

4.1.3 Select a Data Collection Mode

A contingent-valuation study requires the collection of primary data (Step 3). The various modes of data collection are discussed in detail in Chap. 3 of this volume, so this section focuses on insights for contingent-valuation applications.

The most common way to implement contingent-valuation surveys is by mail (Schneemann 1997). Recently, Internet survey implementation has increased. Each method has its relative strengths and weaknesses (Dillman 2007). For a discussion of the alternative survey modes, see Chap. 3.

The primary reason that mail surveys are commonly used is that they have been the least expensive way to collect data. A recent call to a premier survey research firm suggests that the cost of a completed survey ranges from less than $25 for Internet implementation to $50–75 for a mail survey, $75–150 for a telephone survey, and $1,000–2,000 for personal interviews. These numbers are for national probability samples where an existing sample frame is assumed for the Internet implementation, and a national area probability sample is used for the personal interviews. Thus, with a limited budget, it is easy to see why Internet surveys are gaining popularity.

Another factor that has traditionally affected the choice of a survey mode is the expected survey response rate and the potential for item nonresponse to key questions in the survey (Messonnier et al. 2000). Even with careful design, Schneemann (1997) showed that response rates to contingent-valuation surveys are affected by factors that are not under the control of the study design. For example, decisions that involve specific user groups (e.g., anglers or hunters) are likely to result in higher response rates than general population surveys. Schneemann also found that survey response rates are related to the study application, the affected population, and the components of the contingent-valuation question itself, but this is not unique to contingent-valuation surveys. Rather, it is an issue to consider in the design of any survey (Groves and Peytcheva 2008). Thus, even with a good design process, it is important to recognize that features of the study application, as well as the design of the contingent-valuation component of the survey, affect survey response rates across all survey modes.

Response rates have been an issue of concern for Internet surveys. A credible Internet survey requires recruitment of individuals to represent the demographics of a known population, (e.g., the population of the country where the sample is drawn [see Chap. 3]). After this selection, the survey response rate can be less than 10% (Boyle et al. 2016; Manfreda et al. 2008). However, focusing on survey response rates as an indicator of survey quality and nonresponse bias has been shown to be misleading (Groves and Peytcheva 2008; Keeter et al. 2000). As a consequence, other measures of survey representativeness have been proposed (Groves et al. 2008; Schouten et al. 2009). Thus, while the expected survey response rate should be a design consideration, it is not necessarily a good metric to use to select the appropriate survey mode for implementing a contingent-valuation survey.

Conveying information on the item being valued is the fundamental component of a contingent-valuation survey. Personal interviews may be the most effective survey mode for conveying information because visual information can be provided and an interviewer is present to explain the information and answer questions.Footnote 10 An Internet survey allows for control of the information respondents read and can provide live links if a respondent wants to review information already presented, but an interviewer is not available to assist, answer questions, and monitor administration of the survey. A mail survey is more limited because no interviewer is present to explain the visual information and no control is provided on how respondents proceed through the survey instrument. The ability to provide information in a telephone survey is much more limited because no visual information is available. A mixed-mode survey where a telephone interview is conducted after respondents have received written and visual information in the mail, similar to the content of a mail survey, is one way to overcome the informational deficiencies of telephone interviews (Hanemann et al. 1991).

Other methods of implementing contingent-valuation surveys include on-site intercepts (Boyle et al. 1994) and convenience samples of students or other groups who are brought to a central location to participate in the study (Cummings and Taylor 1998). Davis and Holt (1993) noted that “behavior of (real) decision makers has typically not differed from that exhibited by more standard … student subject pools”Footnote 11 (p. 17; see also Smith et al. 1988). Maguire et al. (2003) found that college students behave similarly to a sample of all adults. These alternative survey modes with convenience samples are best used for implementing methodological experiments and not for studies designed to estimate values generalizable to a specific population to support decision-making . When experiments are conducted, it is still important that study participants are exogenously selected, and people are not allowed to voluntarily opt into the sample.

It is important to consider the advantages of each mode when implementing a contingent-valuation survey. A mail survey does not always dominate just because of cost advantages, and personal interviews do not always dominate just because of informational advantages. In reviewing studies that compared contingent-valuation studies implemented via the Internet versus mail, phone, and in-person implementation, Boyle et al. (2016) found that the ratio of willingness-to-pay estimates based on Internet surveys to estimates based on data from other modes averaged 0.92.Footnote 12

4.1.4 Choose a Sample Size

Concerns about sample sizes (Step 4) are important for two reasons. First, the precision of estimated values affects their usefulness in decision-making. An estimate with large error bounds can leave questions about whether benefits really exceed costs or what a specific damage payment should be if a study were used to assess damages. Second, statistical precision affects the ability to detect differences among value estimates in methodological studies designed to investigate the validity and reliability of contingent-valuation estimates.

As noted in Chap. 3, selection of a sample size (Step 4 in Table 4.1) is a matter of choosing an acceptable level of precision within a given budget. That is, the standard error of mean WTP is

$${\text{se}}_{{\overline{\text{WTP}} }} = \frac{\sigma }{\sqrt n } ,$$
(4.3)

where σ is the standard deviation and n is the number of completed surveys used in the analysis. Thus, for a given variance, the absolute value of the standard error can be reduced by increasing the sample size. The larger is σ, the larger the required sample size to attain the desired level of precision. Additionally, for a fixed σ, a sample size will represent a larger percentage of the population for a small population than it will for a large population (Salant and Dillman 1994).

For applications where there have been a lot of contingent-valuation studies conducted (e.g., recreational fishing), there may be sufficient information to develop a reasonable estimate of σ for a new application. Where existing studies are not available, an estimate of σ can be obtained through a field test of the survey instrument. In practice, most studies choose the largest sample size possible given the available budget.

Selecting a sample size also involves consideration of the expected response rate and the expected item nonresponse to the contingent-valuation question and other variables that will be used to analyze contingent-valuation responses.Footnote 13 Other considerations include whether the sample will be stratified into subsamples for analysis and data reporting.

4.1.5 Design the Information Component of the Survey Instrument

Step 5 in Table 4.1 focuses on the information provided to respondents in the survey instrument. This includes telling respondents what it is they are being asked to value, how it will be provided, and how they will pay for it. Economics does not provide guidance for selecting these design features, and careful consideration must be given to these design features in the pretesting of the survey instrument. The design of the valuation scenario should be developed so that respondents believe the change to be valued can be feasibly accomplished and that the features selected do not unduly influence welfare estimates.

There are no hard and fast rules in the elements of this portion of the design phase. Thus, the use of cognitive interviews, focus groups, and pilot surveys is crucially important to understanding respondents’ comprehension of the information presented and how they react to the information.

4.1.5.1 Describe the Item to Be Valued

The description of the item to be valued (Step 5.1 in Table 4.1) tells respondents what is being valued and presents the change in the quantity, quality, or probability to be valued (Step 1 ). The information set should include a description of the item being valued, baseline conditions, and the new conditions. This does not require an extensive, detailed valuation scenario, but a scenario that is clear so that respondents understand the change they are being asked to value.

The information is presented in written or verbal form and is accompanied by graphs, pictures, and other visual stimuli to facilitate respondent understanding. The information scenario is not a marketing or sales pitch but a neutral and fair description of the change to be valued, and it can include statements about why some people would desire the change and why others would not (Alberini et al. 2005). This information should also include an explanation of why the change is being valued.

While the description of the item to be valued is the fundamental component in the design of any contingent-valuation study, it seems that the information is rarely complete in terms of the baseline condition(s) and the new condition(s) that will result as a consequence of the change (e.g., the levels of Q defined in Eqs. (4.1) and (4.2)). This problem arises most frequently in the estimation of values for marginal changes and to a lesser extent when total values are estimated. If respondents must infer the change being valued, it is likely that different respondents will use different subjective perceptions.

The appropriate quantity and quality of information required for respondents to provide valid value responses is a matter of much interest. While some have concluded that a credible contingent-valuation study requires that survey respondents be provided extensive, detailed information, the literature does not support such a conclusion. The literature does indicate that specific types of information should be provided to respondents. While only a small number of studies have investigated the effects of information on contingent-valuation estimates, these studies collectively tell an important story.

Samples et al. (1986) confirmed the obvious: You must tell people what it is they are being asked to value. Boyle (1989), in a study of trout fishing, found that providing respondents with additional information beyond the basic description of the change to be valued decreased estimates of central tendency, but the reductions were not statistically significant. The standard errors of the welfare estimates, however, decreased significantly with additional information. This result suggests information also affects the efficiency of value estimates. Bergstrom et al. (1990) investigated the effect of providing information about the services provided by wetlands and found that value estimates were affected by different types of service information. This outcome suggests that specific information on services is important for applications where respondents are not fully aware of how they currently benefit or could benefit in the future from the item being valued. Poe and Bishop (1999) demonstrated that specific information on well-water contamination was required in a study of groundwater protection. These findings are consistent with a more recent study by MacMillan et al. (2006) that found more information is required when people are less familiar with the item being valued. Valuation of a trout fishery by licensed anglers is an example where the study participants would be more familiar with the item being valued, but wetland and groundwater valuation are examples where study participants would likely have less knowledge or experience. These studies clearly indicate that specific information about the item being valued is required in order to elicit credible responses to contingent-valuation questions, and this is done through the qualitative research process to design the survey instrument.

Li et al. (2014) discussed prior knowledge and acquired knowledge people use in answering a survey. Prior knowledge includes knowledge people possess prior to engaging in the survey and any information they seek external to the survey while completing the survey (e.g., doing an Internet search for additional information). Acquired knowledge is the information provided in the survey instrument. Berrens et al. (2004) suggested that when given the opportunity to seek additional information, respondent use of this information is modest. This suggests that respondent attempts to seek additional prior knowledge could be limited. However, any additional prior information respondents seek introduces potential noise into welfare estimation. Thus, pretesting the information in the survey is needed to establish that survey respondents have the information they need to respond to the value question(s).

While some studies have used pictures and other types of graphics (e.g., maps, graphs, and tables) in valuation scenarios, there do not appear to be any studies that have evaluated whether the use of graphics to display information in the valuation scenario affects value estimates. The use of graphics requires careful pretesting so that the images do not inadvertently introduce unwanted effects into valuation responses. Thus, while pictures and other graphics can be helpful in conveying information in a contingent-valuation survey, they can also generate unwanted effects. The use of multiple modes of portraying information in a contingent-valuation scenario, such as written text, numerical presentation, graphs, or pictures, can facilitate respondent understanding because different people may use different information to understand the change to be valued.

Collectively, the lesson is that respondents to a contingent-valuation survey need to be presented with information that clearly explains the change to be valued and that such information must account for heterogeneity in how respondents use and process information. There is a careful balance between providing too little information such that respondents could misinterpret the valuation question and providing too much information so that respondents do not attend to critical portions of the information provided. Refinement of this information occurs in focus groups, one-on-one interviews , and, if necessary, in small-scale field pretests.

4.1.5.2 Select a Provision Mechanism

In any contingent-valuation study, it is necessary to tell respondents how the change to be valued will be implemented (Step 5.2). Without such information, respondents might not view the change as being credible or might implement a personal assumption that is not appropriate and could inadvertently affect value estimates. In a general sense, the provision mechanism is the “production process” that will accomplish the change respondents are asked to value. Suppose a policy was protection of well water from contamination. One provision mechanism that has been used to provide such protection is to establish protection zones around wellheads , which preclude any activities that might contaminate the groundwater. Sometimes applications have a clear mechanism that is part of the actual action being valued, while in other applications the selection of the provision mechanism is part of the study design.

Choosing the provision mechanism is complicated because the chosen mechanism could affect responses to contingent-valuation questions. For example, consider public concern over chemical residues in fruits and vegetables, genetically modified foods, sweatshop production of clothing, dolphin-free tuna, etc. These production attributes affect purchase decisions for market goods (Foster and Just 1989; Teisl et al. 2002) and there is no reason why there should not be similar types of provision-mechanism effects in responses to contingent-valuation questions. For example, while sweatshop production might not affect the quality of the clothes in terms of their use by the purchaser, this production process could represent an undesirable externality.

The effect of the selected provision mechanism on welfare estimates has not been formally investigated to my knowledge. At a minimum, careful pretesting in focus groups can identify whether respondents will understand the provision mechanism and if they feel it is credible in producing the change to be valued.

4.1.5.3 Select a Payment Vehicle

A payment vehicle (Step 5.3) is the mechanism by which respondents are told how payments would be made. For example, this might be a tax increase for a public policy or a higher price for a health-enhancing procedure .

This is a design area where the trade-off between credibility and unintended effects has been clearly noted in the literature. Mitchell and Carson (1989) argued that the choice of a payment vehicle requires balancing realism against payment vehicle rejection . That is, as realism increases, the likelihood that the payment vehicle will engender responses that protest the vehicle might also increase. For example, water-use fees are very realistic payment vehicles, but someone who values protecting potable groundwater might still give a valuation response of $0 to protest an increase in water rates. Income tax vehicles can run into problems due to resistance to higher taxes. On the other hand, a donation payment vehicle could yield an underestimate of value because this vehicle might not be incentive compatible for estimating a respondent’s full willingness to pay (Wiser 2007).Footnote 14

Failure to provide a realistic payment vehicle can also lead to protest responses. A sales tax would not be credible in an area that does not have sales taxes or when the group providing the item being valued does not have taxing authority. Thus, respondents could reject the valuation scenario even if they value the change because the payment mechanism is not believable. The realism of some vehicles can lead people to give what they think is a reasonable response, not their maximum WTP . For example, where there are only nominal entry fees (e.g., entrance fees at national parks), an increase in an entrance fee could engender responses of what respondents think is a reasonable increase in price rather than statements of maximum WTP (Campos et al. 2007).

Some studies demonstrate that payment vehicles do influence welfare estimates (Rowe et al. 1980; Greenley et al. 1981; Campos et al. 2007), but the line of inquiry has not been prominent in recent years.Footnote 15 Testing of payment vehicle effects is typically undertaken in survey pretesting to select a payment vehicle that minimizes undesirable effects on value estimates.

A variety of payment vehicles have been used in studies and a sampling of those in the recent literature are presented in Table 4.2. These examples are presented as a general set of payment vehicle examples and each could have relative strengths or weaknesses, as discussed above. While research can generate general insights about payment vehicles (e.g., donations are not incentive compatible and taxes are likely to lead to protest responses), selection of a specific payment vehicle is likely to be study-specific and will always require careful pretesting.

Table 4.2 Payment vehicles used in recent studies

A concern with using prices or taxes as a payment vehicle is that people can adjust the quantity purchased (e.g., take fewer recreational trips or consume less water). This makes the cost to the individual endogenous; the respondent controls cost by adjusting quantity. Further, the choice of a payment vehicle must align with the value to be estimated. For example, using an increase in water rates to estimate passive-use values for groundwater (Eq. (4.1)) might not be logical to respondents and could result in some respondents including use values in their valuation responses. The concerns discussed here highlight how important it is to carefully select a payment vehicle that minimizes unintended effects on value estimates.

4.1.5.4 Select a Decision Rule

The decision rule is the process by which the results of the contingent-valuation study, individual valuation responses or summary statistics on valuation responses, are used to inform the decision as to whether the item valued will be provided (Step 5.4). Such a decision rule might be that the item will be provided if at least 50% of respondents answer “yes” to a dichotomous-choice question.

The choice of a decision rule is closely linked to the payment vehicle. A referendum is clearly applicable when the issue relates to the provision of a public good, such as groundwater protection, and the payment vehicle is an increase in taxes. However, a referendum would not be applicable when dealing with use values, such as recreational fishing, and the payment vehicle is an increase in individual trip costs. In this second example, the decision rule mag be if aggregate benefits exceed project costs.

The more complex and perhaps more important cases are the public good cases like the groundwater valuation example. A fundamental goal in the choice of a decision rule is to make a selection that is plausible to respondents and will entice truthful valuation responses. Following Carson and Groves (2007), it is also important that the decision rule is consequential, which means that payment is mandatory if the program is imlemented and there is a non-zero probability that responses to the survey will influence provision of the item being valued.

4.1.5.5 Select a Time Frame of Payment

This step describes the number and frequency of payments respondents will make (Step 5.5). For example, a valuation scenario might be posed where a new filtration system would be installed to remove contaminants from a public water supply. Values could be elicited as a one-time payment now or as annual payments over the lifetime of the system, say 20 years.

While this is another area where there is scant research, Stevens et al. (1997) showed that repeated payments yield statistically different estimates of WTP when compared with a lump-sum payment. More recently, Soliño et al. (2009) found no difference between bimonthly and annual payments when a dichotomous-choice valuation question was employed. These limited results suggest that the choice of a payment time frame must proceed with caution.

There is often a disconnect between the time frame of payment in a contingent-valuation question and the time frame over which survey respondents will enjoy the benefits of the change. Typical changes result in benefits that accrue over a number of years, and the time frame of the payment(s) is often much shorter (e.g., a one-time payment). Thus, survey respondents are asked to undertake personal discounting to answer valuation questions. The research by Stevens et al. (1997) suggested that survey respondents might not do this well.

The time frame of payment varies substantially across studies, from one-time payments to annual payments into perpetuity. The time frame is crucially important because this influences how value estimates are aggregated to compute benefits or costs. This is another design feature that must be carefully addressed in survey pretesting.

4.1.5.6 Substitutes and Budget Constraint Reminders

Substitutes and a budget constraint are fundamental components of economic choices (Step 6). Both the availability of substitutes (Freeman 1993, Chapter 3; Hoehn and Loomis 1993; Flores, Chapter 2 in this book) and income (Flores and Carson 1997) affect the magnitude of welfare estimates.

Though encouraging respondents to consider substitutes and think about their budget constraints when answering contingent-valuation questions is intuitively straightforward , it is difficult to test the effectiveness of these reminders. What might be considered a substitute by one respondent might not be considered a substitute by another respondent. Split-sample studies , where one sample is reminded of substitutes and their budget constraints and another sample is not, reveal that information on substitutes, complements, and budget constraints affect estimates of central tendency and dispersion (Kotchen and Reiling 1999; Loomis et al. 1994; Whitehead and Bloomquist 1995). In a meta-analysis , Schläpfer (2006) found significant income effects for contingent-valuation estimates. Smith (2005) suggested that sensitivity to budget becomes more relevant the higher the cost to the respondent is as a proportion of income. Given the roles that substitutes, complements, and income play in theoretical definitions of economic values, the theoretical component of content validity suggests that respondents should be prompted to consider likely substitutes and complements, and they should be reminded that they could spend their money otherwise.

4.1.5.7 Summary

There is no cookie cutter or one-size-fits-all set of rules for framing contingent-valuation scenarios, but testing draft information scenarios in focus groups is critically important. Focus group testing provides the opportunity to learn if respondents are using the information, understand and believe the information, and are basing valuation responses on the actual change being valued.

Even seemingly innocuous statements and terms in contingent-valuation scenarios have the potential to affect valuation responses. For example, in a study of preserving agricultural lands we quickly found that open space has entirely different meanings to people involved in land-use policy as compared to the general public. Focus group participants told us that open space conveyed a sense of “outer space,” “large foyers,” etc.—not undeveloped land.

Pretesting in focus groups and/or one-on-one interviews is the best way to avoid pitfalls that can bias welfare estimates because of incorrect interpretation of information by respondents, the provision of unintended clues to respondents, and information rejection by respondents. This pretesting must be carefully conducted and is not a substitute for more research to understand the effects of each element of the information in a contingent-valuation scenario. This means that careful design must also be accompanied by conceptual and methodological research to refine what and how information should be presented in the survey to guide the design of future empirical studies.

In practice, it is important to recognize that the information set must vary from study to study to fit the issue-specific application and institutions. Further, any information set will not be understood and accepted by everyone in a sample; the design goal is to minimize misinterpretations and scenario rejection to the greatest extent possible. This means that some information in the scenario will be necessary to satisfy some respondents and other information will satisfy other respondents; the common goal is to elicit consistent value information across all respondents. Little is documented about study pretesting, but reporting of this information in journal articles and other publication will help the valuation community learn about design challenges and successes from prior study designs.

4.1.6 Design the Contingent-Valuation Question

After the information regarding how the change to be valued will be provided, the respondents are asked to reveal information about the value they place on the change described in the valuation scenario. This section provides guidelines and considerations for selecting and designing a contingent-valuation question (Step 6).

4.1.6.1 Select a Response Format

The response format refers to how the contingent-valuation question will be answered (Step 6.1). The three main formats ask respondents to directly provide their maximum willingness to pay (open-ended), choose an amount from a list of possible willingness-to-pay amounts (payment-card), or respond “yes” or “no” to a specified dollar amount (dichotomous-choice). The response format has implications for how the response data are analyzed and interpreted; it is the key characteristic that differentiates the various types of contingent-valuation questions. The information scenario components described above are generally portable from one question format to another with the exception of the decision rule (e.g., a majority vote would work with a dichotomous-choice question but not with an open-ended question).

Early contingent-valuation studies used either an open-ended question (Hammack and Brown 1974) or an iterative-bidding question (Randall et al. 1974). An open-ended question asks respondents how much they would pay for the specified change. An iterative-bidding question starts by asking respondents, “would you pay $SB” for a specified change (SB = starting bid). If respondents answer “yes,” then the bid is increased in specified increments (I) until they say “no” and decreased until they answered “yes” if the initial response was “no” ($SB ± $I). The magnitudes of starting bids, magnitudes of bid iterations, and number of iterations varied from study to study. While the open-ended format has persisted, the iterative-bidding format is no longer used because of an anchoring effect where the final bid at the end of the iterations was found to be significantly correlated with the starting bid (i.e., the higher the starting bid, the higher the final bid to which people would answer “yes” (Boyle et al. 1985; Thayer 1981).

Open-ended questions are still used in some studies. The following is an example of an open-ended question used by Welsh and Poe (1998):

If passage of the proposal would cost you some amount of money every year for the foreseeable future, what is the highest amount that you would pay annually and still vote for the program? (WRITE IN THE HIGHEST DOLLAR AMOUNT AT WHICH YOU WOULD STILL VOTE FOR THE PROGRAM) (p. 183)

Respondents are provided with a blank line where they can write in the maximum they would pay.Footnote 16

In the early 1980s, Mitchell and Carson (1981) introduced the payment card (see also Mitchell and Carson 1993). This was a card with k bid amounts, and it showed respondents how much they pay for selected public services (anchors), which in essence is very general information on substitutes. Respondents were asked to “circle the dollar amount that is the most they would pay” for the change. Current applications of payment cards have proceeded without anchors (Fig. 4.1).

Fig. 4.1
figure 1

Example of an unanchored payment card (Welsh and Poe 1998, p. 183)

Dichotomous-choice questions, introduced by Bishop and Heberlein (1979), ask respondents, “would you pay $B” for the specified change, which is simply the first round in an iterative-bidding question  (Fig. 4.2). The bid amount ($B) is varied over different respondents. The starting-point problems with iterative-bidding questions subsequently led to the adoption of dichotomous-response questions, and the single-shot question is easier to administer than the iterative framework. Some have also posited the heuristic argument that dichotomous-choice questions mimic the take-it-or-leave-it nature of many market purchases. Such a heuristic argument cannot be made for open-ended and payment-card questions.

Fig. 4.2
figure 2

Example of a dichotomous-choice question (Welsh and Poe 1998, p. 183)

This example is a dichotomous-choice response format framed as a referendum. The bid amounts ($B) are entered before the survey is administered.Footnote 17 The referendum vote here is the decision rule. A dichotomous-choice question can be framed as a referendum or not. For example, a dichotomous-choice question can be framed as agreeing to pay or not pay an entrance fee to a national park.

A number of researchers have experimented with variations of the dichotomous-choice format. For example, studies have used one-and-one-half bounded questions (Cooper et al. 2002), double-bounded questions (Hanemann et al. 1991), and multiple-bounded questions (Bateman et al. 2001). Each of these questions present follow-up bids to survey respondents. For example, in the one-and-one-half bound, respondents randomly receive an initial bid. If they answer “yes” to the initial bid amount, they receive a higher bid; if they answer “no,” they receive a lower bid amount. The multiple-bounded question is a repeated dichotomous choice where a response is required for every bid amount, which is essentially a payment card where respondents indicate their willingness to pay each bid amount, not just the maximum they would pay. These alternative specifications of dichotomous-choice questions were proposed to increase estimation efficiency (Hanemann et al. 1991). Responses to a dichotomous-choice question only reveal if each respondent’s value is less than (“no” response) or greater than (“yes” response) the bid amount they received. Adding additional bid amounts reduces the range into which the unobserved values reside.

One of the NOAA Panel recommendations (Arrow et al. 1993) was to allow respondents a “no answer” option in addition to “yes” or “no” when the valuation question was framed as a referendum. This recommendation appeared to logically follow from consideration of undecided voters in predicting election outcomes. While there have been many different interpretations of how “don’t know” responses should be elicited and treated in data analyses, at the basic level recommended by the NOAA Panel, it appears that most these respondents would vote “no” in the absence of a “don’t know” option (Carson et al. 1998; Groothuis and Whitehead 2002).

Payment-card questions and dichotomous-choice questions require an additional design feature of selecting the bid amounts used as the monetary stimuli in the questions. Development of bid values usually follows a three-step process. The first step is to review similar studies in the literature to develop a prior on the distribution of values for the current application. Second, this prior information is used to develop the initial bid amounts used in pretesting the survey instrument, and these bid amounts are adjusted based on what is learned in this survey design process. This should include a field pretest , or pilot, of the full survey instrument and should not be limited to focus group data  if possible. Finally, an optimal bid-design approach can be used to select the bids amounts used in the final survey instrument (Alberini 1995a; Dalmau-Matarrodona 2001; Kanninen 1993a, b; Scarpa and Bateman 2000). Alberini (1995a, b), and Kanninen (1993a, b, 1995) have shown that an optimal design has a small number of bids (five to eight), and the bid amounts should span the median WTP , and not placed too close to the median nor in the tails of the distribution. Very low or very high bid amounts may not be credible to respondents. Mis-specification of the bid distribution such that the most bid amounts fall above or below the median seriously compromises the ability to estimate mean WTP. McFadden (1994) proposed a continuous bid design, which might avoid mis-specification when only a small number of bids are employed. These bid-specification issues were empirically investigated by Boyle et al. (1998), with the empirical results supporting the bid-design recommendations of Kanninen (1993a, b), and Alberini (1995a).

The framing of the actual contingent-valuation questions and their respective response formats are quite simple relative to the framing of the valuation scenario that precedes the valuation question. The one exception is that careful selection of bid amounts is crucial in the design of payment-card and dichotomous-choice questions. In the next section, the relative strengths and weaknesses of these question formats are discussed.

4.1.6.2 Relative Strengths and Weaknesses of Response Formats

While dichotomous-choice questions are most commonly used, each of the three main response formats has strengths and weaknesses (Table 4.3). Conceptual arguments by Carson and Groves (2007), Carson et al. (2014), and Hoehn and Randall (1987) suggest that the “take-it-or-leave-it” nature of dichotomous-choice questions, when framed as a referendum vote for a public good, has desirable properties for incentive-compatible revelation of preferences. There is a single bid amount to which respondents respond, and there is no incentive for respondents to pick very high (more than they would pay) or very low (less than they would pay) dollar amounts to purposely misstate their values. This is not the case for open-ended and payment-card questions where respondents can influence the outcome of a study by the value they state or dollar amount they pick. For example, if respondents want to see a change occur, they can state an open-ended value or pick a payment card amount that exceeds their WTP . Alternatively, if they want to send a signal that they want the cost to be low, they might select a value below what they would actually pay. The opportunities for such misstatements of value are not consistent with incentive comparability.

Table 4.3 Comparison of contingent-valuation response formats

Cummings and Taylor (1998) argued that dichotomous-choice questions must be accompanied by the realism that the referendum vote will be binding (i.e., respondents must believe the change will be implemented if more than 50% of respondents vote “yes”). This concept has been more formally developed by Carson and Groves (2007) and Vossler et al. (2012). In contrast to Cummings and Taylor (1998), Carson et al. (2014) argued that it is not necessary that the referendum is binding but that the results of the survey referendum will have a nonzero probability of being used in the decision-making process to provide the item being valued. This provides a strong incentive for dichotomous-choice questions framed as a referendum as the preferred framing of a contingent-valuation question.

Responses to open-ended questions result in a continuous distribution of responses on the interval [0, +∞), while payment-card responses reveal whether the respondents’ values reside within a k + 1 interval where k is the number of bid amounts ($B) on the payment card [$BL ≤ WTP ≤ $BU), where WTP is willingness to pay, BL is the bid chosen by the respondent, and BU is the next bid higher than the chosen bid. Responses to dichotomous-choice questions indicate only whether each respondent’s values lie below [$0, $B) or above [$B, +∞) the bid threshold. Assuming truthful revelation of responses, a person with a value of $15 would respond in the following manner to each of the three basic contingent-valuation response formats:

  • The response to an open-ended question would be “$15.”

  • The response to a payment-card question with bids of $1, $10, $20, and $30 would be “$10.”

  • The response to a dichotomous-choice question with a bid amount of $10 would be “yes.”

Thus, the dichotomous-choice response format reveals if a respondent’s value lies in the interval [$10, +∞); for the payment-card response format, the respondent’s value resides in a narrower interval [$10, $20); and for an open-ended response format, the value of $15 is observed. Therefore, in terms of estimating central tendency, the open-ended format provides the most efficient estimates, while the dichotomous-choice format provides the least efficient estimates (seoe < sepc < sedc, where se is the standard error of the estimated mean).Footnote 18 This relationship assumes that all three response formats incentivize respondents to truthfully reveal their preferences.

However, each of the response formats can have unique impacts on respondents’ answers to a contingent-valuation question. For example, open-ended questions are believed to yield an unusually high percentage of responses of $0 in that some people might hold a value, but answer $0. It is also argued that people have difficulty coming up with a maximum willingness to pay amount for policies they are not familiar with. A manifestation of this issue is that the empirical distributions of responses to open-ended questions are not smooth and tend to have spikes at $5 increments. This rounding to the nearest $5 further attests to the difficulty respondents might have giving a precise dollar value. For examples of issues with open-ended questions, see Bohara et al. (1998), Donaldson et al. (1997) and Welsh and Poe (1998). All in all, very few applications use open-ended questions today.

Payment cards appear to avoid the issues of a spike of zero values and respondents having to provide a specific dollar value. However, Rowe et al. (1996) found that the bid amounts on the payment card can influence value responses. With careful framing, a payment card question can be posed in the context of a referendum and presented as consequential. Only a few studies still use payment cards (Covey et al. 2007; Ryan and Watson 2009), but despite this low usage, payment-card questions might be the best alternative to dichotomous-choice questions.

While dichotomous-choice questions gained popularity to avoid the anchoring in iterative-bidding questions , dichotomous-choice questions are not free from anchoring problems (Boyle et al. 1997, 1998; Green et al. 1998). That is, respondents have a propensity to say they would pay bid amounts that likely exceed their true values and to say they would not pay low bid amounts below their true values. The issue seems to be most problematic with high bids, which would serve to inflate value estimates (Boyle et al. 1998). Prices and quality are often perceived as being correlated in market goods, and this market intuition could lead respondents to interpret single bids as implicit signals of quality that lead to anchoring (Gabor and Granger 1966; Shapiro 1968).

Further, concerns about the effects of bid amounts on responses extend to one-and-one-half-bound , double-bound, and multiple-bound questions (Bateman et al. 2001, 2009; Herriges and Shogren 1996; Roach et al. 2002; Watson and Ryan 2007). Thus, while dichotomous-choice questions are theoretically incentive compatible, research suggests that value estimates might not be robust to manipulations in the bid design. An interesting question is whether a highly consequential survey would be less susceptible to bid effects.

Dichotomous-choice questions, posed as a referendum vote, are the safe approach to frame contingent-valuation questions, and the extensive use of this approach in the peer-reviewed literature supports this endorsement. However, the referendum framing of a dichotomous-choice question is not practical in some contexts such as recreation use values. Following are some additional considerations in the framing of contingent-valuation questions.

4.1.6.3 Allowing for Values of $0

Some people include the issue of zero bidders under the general heading of protest responses, but there are two issues here (Step 6.2). The first relates to people who give a response of $0 because they reject some component of the contingent-valuation scenario; these are protest responses that will be dealt with in the next section. This section considers those who truly hold values of $0. It is quite possible that a change might not be utility-increasing for some segment of the sampled population, and respondents need a way to indicate such a lack of value.

With an open-ended question, a respondent can simply enter a response of $0, and a payment card can include a value of $0 for respondents to circle. The more problematic case is a dichotomous-choice question where respondents can answer “no” to the bid but do not have the opportunity to express a value of $0. In these cases, we know only whether respondents’ values lie within the interval (−∞, $B). We do not know if there is a spike in the probability distribution at $0, and it is necessary to have a separate question to identify respondents whose values are $0.Footnote 19 This $0-value screen question has been implemented by posing it before the contingent-valuation question and then administering the valuation question only to those who answer “yes” to this screen. Alternatively, this question could probe respondents after they have answered the contingent-valuation question by asking respondents who answer “no” to the bid if they would “pay anything” for the change.

For example, Ahearn et al. (2003) used the following question that preceded the contingent-valuation question: “Would you vote for the proposal if passage of the proposal would increase your household’s 1998 income tax?” Respondents who answered “no” were not asked the contingent-valuation question.

A related issue is that policies might actually give some people disutility, which would imply that their values would be strictly negative. While it appears that most studies treat people with negative values as $0s or that such outcomes are artifacts of the statistical distributions assumed in econometric estimation (Haab and McConnell 1997), others have attempted to investigate the plausibility of negative values (Berrens et al. 1998; Bohara et al. 2001).

4.1.6.4 Protests and Other Types of Misleading Responses

There are at least three types of potential response categories under the heading of protests, all based on a presumption that these are respondents who do not report their true values (Step 6.3). It is important to note that these can be overtly misleading responses or misleading responses that occur inadvertently. Inadvertent misstatements can occur because someone does not fully understand the valuation scenario or because of experimentally induced errors.

The first category includes people who protest some component of the contingent-valuation scenario. These respondents might answer “$0” even though they hold a value for the item, which biases the estimate of central tendency downward, or they might choose not to complete the survey, leaving the effect on central tendency dependent on how these respondents are treated in the analysis of the contingent-valuation data.

The second category includes people who do not understand what they are being asked to value and who answer the valuation question anyway. The effect of this misunderstanding might not introduce a bias into estimates of central tendency, but it most likely will increase noise in the data that will increase the standard error of the mean.

The third category is people who behave strategically in an attempt to influence survey results and ultimately the decision. If everyone who is behaving strategically acts in a similar manner, the effect will be to introduce a bias into the estimate of central tendency. However, some people could have incentives to understate values, and others could have incentives to overstate the values, leaving the overall effect on estimates of central tendency indeterminate.

Within the contingent-valuation literature, two types of misleading responses have received specific attention: warm glow and social desirability. “Warm glow” arises from the utility that people receive from stating a willingness to pay and not for the change actually being valued (Andreoni 1989). Some have suggested that warm glow confounds estimation of WTP , but perhaps the effects can be removed from estimates (Nunes and Schokkaert 2003). However, the warm glow literature has largely been developed for philanthropy and donations (Harbaugh 1998), and the extension to contingent-valuation estimation of WTP is not fully explored. Donations are not a desirable payment vehicle for contingent-valuation questions because the goal is to estimate willingness to pay at the point of indifference, which is shown by the equality in Eqs. (4.1) and (4.2). This is what some people refer to as “maximum willingness to pay.” It is not a donation toward provision of the item being valued, but the measurement of a specific economic concept.

Social desirability bias arises when respondents answer questions in a manner to please another person such as the interviewer in a personal interview. While some have detected social desirability (Leggett et al. 2003) in contingent-valuation estimates, it is likely that this effect is limited to interview formats where there is a clear incentive to please, which is not the general case for contingent-valuation studies . For example, the Leggett study was conducted in person on a boat as people were returning from visiting a military historical site in an area people visit because of the military history. Even in cases where social desirability might arise, it can be addressed through survey implementation procedures (Kreuter et al. 2008).

In terms of identifying misleading responses, empirical applications have used a variety of approaches to identify anomalies in responses to contingent-valuation questions. Some have included questions in the survey to probe respondents’ understanding and motivations when answering the contingent-valuation question (Ajzen et al. 1996; Berrens, Bohara et al. 1998; Blamey et al. 1999; Stevens et al. 1994). Others have trimmed the upper values if they are greater than a certain percentage (e.g., 10%) of a respondent’s income (Mitchell and Carson 1989, pp. 226-227). Others have used statistical routines as described in Belsey et al. (1980) to identify responses that have undue influence on estimation results (Desvousges et al. 1987).

While most acknowledge that there are potentially some misleading responses in contingent-valuation data, there is no established procedure with a sound conceptual basis for excluding responses. The reasons for this are varied. What if a person gives one response that suggests a protest, but provides another response that indicates they are providing a valid response? Which response is correct or more meaningful? Or what constitutes a sufficient lack of understanding such that a respondent’s valuation response should be excluded from statistical analyses? For example, it would not be appropriate to exclude respondents with low levels of education because they still have preferences and hold values. They might not be able to understand the valuation scenario as well as other respondents, but they make market decisions with a similar ability on a daily basis. Questioning people after they answered the valuation question is problematic; people who are behaving strategically would be unlikely to tell you that they are doing this. People who do not understand the valuation question also might not understand the follow-up question. In addition, these responses to follow-up questions cannot be assumed to be exogenous to responses to the valuation question.

Another approach, trimming the tails of the distribution by deleting outliers, must be done with care (e.g., people with high values might be those who have the most to lose). For example, some people give up income to live in areas that are near desirable resources; for these people, welfare losses could be quite large relative to income.

Another question deals with how much of an effect misleading responses actually have on estimates of central tendency. The famous Marwell and Ames (1981) study found that “economists free ride,” but others do not. This suggests that strategic behavior might be relegated to a small segment of any sample. Thus, those who behave strategically or who more generally provide protest responses might be a small segment of the sampled population and might not behave in a way that influences sample statistics. The first contingent-valuation study that I conducted was undertaken with personal interviews of people while they were recreating on-site . An environmental group came through one of the survey locations; they talked among each other and encouraged each other to behave strategically by giving high-value responses to influence a desirable environmental outcome. We marked all of the surveys to identify these individuals in data analyses. Despite behaving strategically, none of their responses were statistical outliers and most responses were quite close to the sample mean; their strategic behavior was not effective at manipulating sample statistics. Thus, while strategic behavior could occur, it is possible that it is not sufficiently pervasive or of a magnitude to have an effect on welfare estimates.

Recent studies have looked at segmenting true and protest zero responses. Jones et al. (2008), based on an open-ended question, segmented true and protest zero responses and then analyzed the data using a Tobit model after removing protest zero responses. Strazzera et al. (2003) provided a more sophisticated econometric approach that accounts for both true and protest zero responses without any ex ante segmenting. Meyerhoff and Liebe (2006) looked at protest more generally throughout their data and found that a variable that scales the level of protest response resulted in a higher level of protest, decreasing the likelihood that someone would hold a nonzero value and decreased estimated willingness to pay.

In contrast to treating protest responses as faulty data , Garcia-Llorente et al. (2011) suggested that protest responses can be useful in identifying specific design elements of environmental programs that will engender greater public support. This changes the perspective and dimensionality of a contingent-valuation study from simply estimating a value to support decision-making to providing richer information to decision-makers regarding program design.

Despite the issues discussed above, contingent-valuation study designs should consider including questions to differentiate true $0 responses from potential protest $0s, investigate the presence of data outliers, and include questions to probe respondent acceptance of the change valued, the provision mechanism, the payment vehicle, the decision rule, the time frame of payment, and the actual payment amount in payment-card and dichotomous-choice questions. While the issue of misleading responses to contingent-valuation questions deserves consideration, it is a tough conceptual and empirical issue that deserves greater consideration at a conceptual level of what actually constitutes a protest response such that the observation should not be included in data analyses to compute value estimates. Further, additional robustness studies are needed to determine if there is a systematic effect on estimates of central tendency and dispersion.

4.1.7 Develop Auxiliary Questions for Statistical Analyses of Contingent-Valuation Responses

Step 7 calls for development of auxiliary questions designed to collect data to be used in the analyses of responses to the contingent-valuation questions, and auxiliary questions can provide information to support decision-making beyond what is provided by the valuation component of the study. These are in addition to probing for potential protest responses, which was discussed in the preceding section. The most obvious candidates are income (Schläpfer 2006) and other variables that might influence value estimates (e.g., Dupont 2004; Farreras et al. 2005).

Questions such as income, sex, and other demographic questions that have objective responses are commonly placed at the end of a questionnaire following standard survey practice. Opinion questions, whose responses are to be used in analyzing responses to the valuation question, might include a question asking respondents to rate the current condition of the item being valued. Such a question should be placed before the valuation question(s) in a survey. If such questions are placed after respondents have answered the valuation question, their responses cannot be assumed to be exogenous to responses to the valuation question. Data on these types of variables are commonly used in the estimation of econometric models of payment-card and dichotomous-choice data.

Secondary data may also be incorporated in econometric analyses. For example, in a study of lost value from contamination of an aquifer, Poe and Bishop (1999) argued that well-specific water quality data are needed in the valuation question. Spatial data can also be useful to develop variables that describe proximity of respondents’ households to wells with known levels of contamination or proximity to the source of the contamination. It is important to have questions in the survey that will help in matching the respondent’s answers to valuation questions to the auxiliary data that will be merged with the survey response data.

Finally, it is important to consider whether existing surveys (e.g., the U.S. Census, NORC General Social Survey, etc.) have similar questions. Using the same framing of questions as existing surveys allows for direct comparisons of the data for assessing whether sample selection has occurred, merging data sets for richer statistical analyses, and filling in missing data due to item nonresponse . Cameron et al. (1999) used such an approach to address nonresponse bias in a mail survey.

4.1.8 Pretest and Implement the Survey

Chapter 3 provides details of survey design and administration (Step 8). This section will briefly discuss the use of cognitive interviews, focus groups, and pilot surveys in the design of a contingent-valuation survey.

Surveys are pretested through an iterative process of one-on-one interviews , focus groups, and/or field trials (pilot surveys). Here again, there is no one-size-fits-all approach. Some studies might use one focus group, while other studies use multiple focus groups. Some studies might use focus groups and cognitive interviews, while other studies only use focus groups. These choices can depend on the available budget and the complexity of the valuation topic. For studies with a large budget and a complex valuation topic, the design process could be the following:

  • Focus groups to learn about potential respondent’s knowledge and beliefs about the item being valued.

  • Focus groups to test the valuation scenario.

  • Cognitive interviews to learn what people think about the valuation scenario in the absence of group effects and to learn more deeply about how potential respondents are reacting to the survey information.

  • Focus groups to pretest a complete draft of the survey instrument.

  • A field pilot to develop information on the survey response rate and item responses to individual questions in the survey with particular interest in responses to the valuation question.

How qualitative research tools and field pilots are used in the survey design process varies from study to study, but there is a common feature. The design process typically starts with a simple structure to learn from potential respondents, and the complexity of the survey instrument is built through what is learned in each focus group, cognitive interview, and pilot. While this qualitative research is common to survey design, the unique component here is the design of the contingent-valuation scenario.

The give and take of the focus group process can be useful by learning what is important to one person in terms of information about the item being valued is not very important to the rest of the group. Learning how different people in the focus groups respond to different features of the valuation scenario is a crucial feature of this survey design research. Cognitive interviews allow for more in-depth probing of potential respondents’ reactions to the surveys materials in the absence of the group dynamics. The pilot survey is particularly useful for developing information on potential values to be used to develop bids for payment-card and dichotomous-choice questions.

This qualitative pretesting of the survey instrument and administration process ensures that survey questions are understandable to respondents and are actually eliciting the information they are designed to elicit, and that the survey will yield adequate responses to support statistical analyses of the resulting data.

Following the pretesting, the survey should be implemented using best practices for survey administration as discussed in Chap. 3.

4.1.9 Data Analysis

The process of data analysis (Step 9) varies with the response format used for the contingent-valuation question. We start with responses to open-ended questions, move to payment-card responses, and close with dichotomous-choice responses, thereby covering the three main types of contingent-valuation response formats.

Responses to open-ended questions are mathematically the easiest to analyze in terms of computing the arithmetic mean:

$$\overline{\text{WTP}} = \sum\limits_{i = 1}^{N} {\frac{{{\text{WTP}}_{i} }}{N}},$$
(4.4)

where WTP i is the open-ended response for the ith respondent, and N is the number of observations (complete surveys). The responses to an open-ended question (WTP i ) are individual statements of value, WTP or op in Eqs. (4.1) and (4.2), respectively. If open-ended responses are analyzed as a function of variables that explain WTP, a theoretical specification would be based on a definition of the value as in Eqs. (4.1) and (4.2), which would be solved for WTP or op as the dependent variable.

Analyses of payment-card and dichotomous-choice data require econometric analyses and the estimated equations are used to derive estimates of WTP. These analyses start with the specification of a function based on theoretical definitions of value, like the examples presented in Eqs. (4.1) and (4.2). Willingness to pay can be expressed as

$$\log ({\text{WTP}}_{i} ) = x_{i}^{\prime} \alpha + e_{i} ,$$
(4.5)

where \(x_{i}^{'}\) is a vector of arguments that affect the magnitude of individual’s WTP, α is a vector of preference coefficients to be estimates, and e i is the random error that might be assumed to be distributed normally with mean zero and standard deviation σ. The function x′α is specified as the solution to equations such as (4.1) or (4.2) that define the value being estimated. The vector x i logically includes variables that describe the change valued. As with most econometric analyses, the explanatory variables are often chosen based on economic theory and previous research, and are hypothesized to affect the magnitudes of respondents’ WTP.

Analysis of payment-card data proceeds by modeling the interval where respondents have revealed that their values reside. These intervals are bounded by the bid amounts each respondent circled and the next highest amount on the payment card. Following Cameron and Huppert (1989), respondents’ true values \(\left( {{\text{WTP}}_{i}^{t} } \right)\) reside in the interval ($B1i , $B ui ], where “1” denotes the lower bid the respondent circled and “u” denotes the next highest bid amount on the payment card.Footnote 20 coefficients . Since WTP i is not actually observed, the probability that WTP i falls into the chosen interval on the payment card is modeled as

$$\Pr \left( {{\text{WTP}}_{i} \in \left( {\$ {\text{B}}_{1i} ,\$ {\text{B}}_{ui} } \right]} \right) = { \Pr }\left( {\frac{{\log \$ {\text{B}}_{1i} - x_{i}^{\prime} \alpha }}{\sigma } < t_{i} < \frac{{\log \$ {\text{B}}_{ui} - x_{i}^{\prime} \alpha }}{\sigma }} \right),$$
(4.6)

where t i is a standard normal variable. Using the estimates coefficients \(\left( {\hat{\alpha }} \right)\), willingness to pay can be derived asFootnote 21

$$E\left( {{ \log }({\text{WTP}})} \right) = x^{\prime} \hat{\alpha },$$
(4.7a)

or

$$E\left( {\text{WTP}} \right) = \exp \left( {x^{\prime} \hat{\alpha }} \right)\exp \left( {\frac{{\hat{\sigma }^{2} }}{2}} \right).$$
(4.7b)

The point estimate of WTP that results from Eq. (4.7b) is a function of chosen levels of the x vector, and several computation approaches are generally followed. The first is to insert mean values for each element of x to predict WTP . The second is to insert the unique values for the elements of x for each person in the sample, predict the individual-specific WTP, and then compute mean WTP from the predictions. To compute WTP for a specific change in quality (or quantity), the variable(s) in x representing the change need to be set to the appropriate level(s).

Again, following Cameron and Huppert (1989)Footnote 22 and continuing with the notation established for the analysis of payment-card data, analysis of dichotomous-choice data proceeds as follows:

$$\begin{aligned} \Pr \left( {{\text{yes}}_{i} } \right) & = \Pr \left( {\log ({\text{WTP}}_{i} ) > \log \$ {\text{B}}_{i} } \right) \\ & = { \Pr }\left( {\frac{{e_{i} }}{\sigma } > \frac{{\log \$ {\text{B}}_{i} - x^{\prime} \alpha }}{\sigma }} \right) \\ & = 1 - \emptyset \left( {\frac{{\log \$ {\text{B}}_{i} - x^{\prime} \alpha }}{\sigma }} \right). \\ \end{aligned}$$
(4.8)

The computation of mean and median values proceeds as described above for payment-card data .

The issue of a spike in the probability distribution at $0 for people who do not value the change has received limited attention in the literature. In order to address zero values in the analysis of contingent-valuation data, this issue needs to be considered early in the study design to identify people who hold values of $0. With this information, open-ended responses can be analyzed using a Tobit model (Jones et al. 2008; Strazzera et al. 2003), and alternative specifications have been considered for payment-card and dichotomous-choice data (Kriström 1997; Bohara et al. 2001).

In addition to the parametric approach described above, semiparametric and nonparametric approaches have been used to analyze dichotomous-choice data (Araña and León 2005; Creel and Loomis 1997; Crooker and Herriges 2004; Fernández et al. 2004; Haab and McConnell 1997; Kriström 1990; Li 1996). A nonparametric approach reduces the distribution and functional form assumptions that must be imposed on the data when estimating WTP . A lower-bound nonparametric estimator is often used to compute WTP and this estimator is specified as

$$\overline{{{\text{WTP}}_{l} }} = \sum\limits_{i = 1}^{k} {(\$ {\text{B}}_{i} - \$ {\text{B}}_{i - 1} )} *p(\$ {\text{B}}_{i} ),$$
(4.9)

where \(\overline{{{\text{WTP}}_{l} }}\) is a lower-bound estimator of WTP, k is the number of bids used in the study design, and p(•) is the percentage of respondents that answered “yes” to bid amount $B i . Here it is assumed that \(\$ {\text{B}}_{i - 1} = 0 \,{\text{for}}\, i = 1\).Footnote 23 This is a lower-bound estimator because anyone who might hold a value greater than the highest bid amount is assigned a value of $0, and the percentage of people who answered “yes” in each bid interval are assigned the lower percentage of the higher bid. For discussion of the econometric analysis of this nonparametric lower-bound estimator, see Lewbel et al. (2011) and Wantanabe (2010).

It is also important to consider the variability of estimated WTP for studies to support decision-making and for experiments. Variance can be a function of heterogeneity in values across the sample and can be affected by the study design. That is, not everyone in a sample holds the same value, and larger within-sample variability will lead to larger standard errors of the estimated mean. In addition, the quality of the contingent-valuation scenario (and survey) can influence the estimated standard error; a poorer scenario design might result in a larger standard error of estimated WTP than a better scenario design. This can arise because there is less or greater understanding of the description of the item being valued. Considering variability in estimated willingness to pay is important to support decision-making and for experiments. Larger variances can lead to failure to reject the null hypotheses when the hypotheses should be rejected in statistical tests, and they can reduce the credibility of WTP estimates to support decision-making. Thus, it is important to compute the standard error of estimated WTP and consider design approaches that minimize undue variability in valuation responses through careful pretesting of valuation scenarios.

A number of approaches have been used to develop confidence intervals for WTP estimates (Cooper 1994). A quick review of recent articles suggests that two approaches are used the most commonly. The first is the well-known Krinsky-Robb approach that Park et al. (1991) introduced to the nonmarket valuation literature. The second is the convolutions approach introduced by Poe et al. (1994). See Cooper (1994), and Poe et al. (2005) for comparisons of the alternative approaches.

There are other approaches that have been used to analyze payment-card and dichotomous-choice response data, and I have simply presented the approachs here for illustrative purposes. Readers seeking more information on the econometric analysis of contingent-valuation data should see Haab and McConnell (2002).

4.1.10 Report Study

The singular most important element in reporting a contingent-valuation study that supports decision-making is to present an aggregate welfare estimate (aggWTP). This calculation, in simplest terms, is

$${\text{aggWTP}} = \overline{\text{WTP}} *N,$$
(4.10)

where \(\overline{\text{WTP}}\) is the estimated mean willingness to pay from the sample and N is the size of the affected population. But this calculation is not simple at all, and a number of questions need to be addressed:

  • Will mean or median willingness to pay be used?

  • Will weighting be used to bring respondent sample characteristics in line with the affected population characteristics?

  • Should individuals who are affected by the change valued and eligible to participate in the study, but are precluded for some reason (e.g., a gated community not allowing access to interviewers) be included in N?

  • How should people who resfuse to complete the survey or answer the valuation question be treated?

There are a variety of other questions in this line that might need to be addressed, and addressing these issues is critically important because they can substantially affect the magnitude of the aggregate value estimate.

Reporting of study results (Step 10) requires an understanding that the findings serve multiple purposes—the current decision analysis, transfers to new decision applications at the same application site or to an entirely new application, and advancing the literature by helping future investigations understand the study design, estimation, and testing. Even methodological studies, while not designed to support decision-making , could still be used in benefit transfers (Chap. 11). Thus, clear and detailed study documentation within the limits of the publication outlet is crucial for study credibility and advancing the contingent valuation literature. The steps (Table 4.1) in this chapter provide a guide to assist in this documentation, such as

  • The study application.

  • The theoretical definition of the estimated value that includes the current condition or baseline and the change valued.

  • The steps in the design process and discussions of why items in the final design worked well, what design items were tried and discarded, and why they were discarded.

  • The sample frame.

  • Survey mode and response rates reported according to the American Association for Public Opinion Research guidelines (www.aapor.org/Standards-Ethics/Best-Practices.aspx).

  • The verbatim commodity description and valuation scenario from the survey.

  • The contingent-valuation format used, including the question wording.

  • Respondents’ demographic characteristics, and use or preferences for the item valued.

  • Methods of data analysis, including treatment of $0 values and protest responses, and econometric analysis.

  • Estimates of central tendency and dispersion, methods used to calculate these sample statistics, and any robustness checks.

This information allows readers to evaluate the content validity of value estimates for the current application and to evaluate the transferability of value estimates to new applications. With limitations on what can be reported in journal articles, best practices should include providing online appendices that fully document studies.

4.2 Reliability and Validity

A reliability study investigates the variance in value estimates. Tests of validity ask whether a contingent-valuation study accurately measures the value concept it is designed to estimate (see Chap. 12). The consideration of reliability and validity collectively constitute what is referred to as a credible or accurate contingent-valuation study.

4.2.1 Reliability

Reliability investigates the variance in value estimates. The common investigation approach is test–retest reliability , where a contingent-valuation survey is repeated at two different points in time. This can be a replication with the same respondents or a between-subjects design where the two samples are drawn identically from the same sample frame.

The consensus in the literature appears to support a conclusion that contingent-valuation estimates are reliable (Brown et al. 2008; Carson et al. 1997; Kealy et al. 1988, 1990; Loomis 1989, 1990; Onwujekwe and Fox-Rushby 2005; Reiling et al. 1990; Stevens et al. 1994; Teisl et al. 1995). Thus, reliability of contingent-valuation estimates is not an issue of concern.

It is also important to recognize that values can and should change over time. Thus, failure to establish statistical equivalence in values over time does not refute reliability in instances where values have legitimately changed.

4.2.2 Validity

Three types of validity are commonly investigated: content, construct and criterion (Carmines and Zeller 1979). Criterion validity compares contingent-valuation estimates to a measurement that is external to the contingent-valuation study and is a presumed measure of the true value. The cash transactions in the Bishop and Heberlein (1979) study provided such a criterion upon which the parallel contingent-valuation estimate is validated. Convergent validity, which is a specific type of construct validity, investigates the consistency of contingent-valuation estimates with estimates provided by another nonmarket valuation method. This is what was done when Bishop and Heberlein compared the contingent-valuation estimate with those derived from a travel-cost model . Content validity asks whether the elements in the design of the contingent-valuation survey and data analyses are consistent with economic theory, established practice, and the valuation objective.

Validity assessments are only as good as the counterfactual that they are compared against. The criterion must be a true representation of the economic construct being valued. Convergent validity only establishes whether two estimation approaches are comparable; comparability can hold when both approaches have the same bias. Or, failure to establish convergent validity only implies that one or both estimates are biased. Two types of criterion validity studies are discussed: comparisons with cash experiments and comparisons with outcomes of actual referendum votes. Content validity requires a collective protocol for what constitutes best practices; the material provided in this chapter is one such documentation of best practices.

Researchers have conducted meta-analyses of studies using experiments to investigate differences in outcomes between contingent-valuation (or stated preference) treatments and control treatments where a simulated market is used that involves cash transactions (List and Gallet 2001; Little and Berrens 2004; Murphy et al. 2005). List and Gallet reviewed 29 studies that provided 174 validity comparisons.Footnote 24 Note, the comparisons are not all based on contingent-valuation studies , but experiments where stated preference value estimates are compared to “parallel” values estimated using cash transactions, and the comparisons may be for private goods and not public good applications. List and Gallet report a calibration factor for each study, which is the stated-preference value divided by the cash value. They conclude that the “calibration factors for the most prevalent type of study are 1.26 (minimum), 1.28 (median), and 1.30 (maximum), which suggests that the most common type of (WTP) study will tend to produce a slightly (upward) biased estimate of the actual value” (p. 250). This difference (calibration factor) is what is known as hypothetical bias. However, the credence of this conclusion crucially depends on two features: first, the experiment is similar to what would be conducted in a contingent-valuation experiment, and second, the cash value measures the same underlying Hicksian concept of value as contingent valuation.

Some researchers have compared contingent-valuation study results to those from a parallel referendum vote. While many have asked how contingent-valuation estimates would compare to values if a market existed, comparisons to referendum votes make logical sense. Many contingent-valuation applications value public goods where a majority vote would be a logical decision rule. Further, the preferred question framing is as a dichotomous choice on a referendum vote. Vossler and Kerkvliet (2003) found that “survey responses match the actual voting outcome and WTP estimates based on the two (survey and voting) are not statistically different” (p. 631). Johnston (2006) similarly found no statistical difference between the proportion of “yes” votes in a stated preference study and an actual referendum at the expected program cost per household. Vossler et al. (2003) found a similar result when undecided votes were excluded or treated as “no” responses in the contingent-valuation portion of the experiment. These results indicate that responses to contingent-valuation questions, framed as votes to a referendum, mimic how people vote when they go to polls to vote on environmental issues. While there have been far fewer of these studies than of the comparisons with cash transactions discussed above, these studies indicate that comparing contingent-valuation outcomes with referendum votes is an important and promising line of inquiry.Footnote 25

Finally, the discussion of validity has been extended to investigations of whether contingent-valuation results follow an expected pattern, which has been termed a “test of scope.” A test of scope asks the simple question, “Are people willing to pay more for a larger change in the item being valued than for a smaller change?” but the conduct of these tests is varied (Desvousges et al. 1993). A meta-analysis of scope experiments suggests that contingent-valuation studies do pass a scope test (Ojea and Loureiro 2011).

Some, however, have argued that a scope test should address more than the simple question of whether more is valued more highly than less of the item being valued (e.g., the adding-up test ; Diamond 1996; Desvousges et al. 2012). However, for these additional tests to be applied, assumptions about the structure of preferences must be imposed (Haab et al. 2013). Thus, if an empirical test fails, it is impossible to differentiate failure due to a problem in the structure of a contingent-valuation study or failure due to the wrong assumptions about preferences being imposed. Thus, while a scope test can provide insight about the credibility of a contingent-valuation estimate of value, it is a weak test of validity at best (see Heberlein et al. 2005).

More will be said about these points and other considerations of validity in Chap. 12.

4.2.3 Enhancing Validity

Based on the presumption that contingent valuation leads to overestimates of what people would actually pay (List and Gallet 2001), a number of protocols have been proposed to remove overestimation . These include the lower-bound nonparametric estimator of WTP discussed above (Haab and McConnell 1997), insertion of cheap talk into a survey instrument (Cummings and Taylor 1999), use of self-reported uncertainty scales after respondents have answered the contingent-valuation question (Champ et al. 1997), and consequentiality (Carson and Groves 2007). The latter of these approaches holds the most promise.

Each of these approaches leaves questions to be answered. Is it desirable to provide a lower-bound estimate of WTP that might actually underestimate true willingness to pay? Does cheap talk provide the correct incentive to provide truthful answers to a contingent-valuation question or just challenge people to lower their value responses? What is the basis for the correct interpretation of the uncertainty scale, and does it vary over specific valuation applications? Consequentiality, which requires a binding payment and a non-zero probability that the value estimates will inflence decision making, holds the most promise (Carson et al. 2014; Vossler et al. 2012).

4.3 Conclusions

It is important to recognize that the design of a contingent-valuation study involves constraints and trade-offs. The constraints could come from the actual change being valued, where the realism challenges the design (e.g., a tax will be used to fund the provision of the item to be valued, but pretesting reveals people value the item and oppose a tax increase). Trade-offs could arise in an attempt to develop a clear scenario while also seeking brevity. These decisions are part of the “art” of contingent valuation and have provided opportunities for critics to question the credibility of contingent-valuation estimates. This criticism is not all bad as it has led to better-designed studies and more-focused validity research. The outcome has been improved contingent-valuation studies that are in common use to support public and private decision-making.

It is important also to realize that the influence the survey design has over the outcome of any contingent-valuation study is not any different from any other line of research or empirical analysis. Simply put, contingent valuation—like any other empirical method—requires a high degree of skill and considerable art that must be combined with careful pretesting and validity checks.

Finally, the next chapter (Chap. 5) discusses choice experiments , which are a close cousin to contingent valuation in the family of stated preference valuation methods. There are some unique differences between these valuation approaches:

  • Contingent-valuation scenarios define the change to be valued in a written scenario, while choice-experiment scenarios define the change using specific levels of attributes.

  • Contingent valuation, using a dichotomous-choice question, asks respondents to choose between the change and a status quo condition, while choice experiments typically ask respondents to choose between two or more changes (alternatives) and a status quo condition .

  • Contingent-valuation studies often include one valuation question, while choice experiments typically include multiple valuation questions.

The key distinction between contingent valuation and choice experiments is the scenario presentation; in a choice experiment respondents are presented with alternatives to choose among (usually three or more) that are described in terms of attributes (again, usually three or more). Whether contingent valuation or a choice experiment should be employed is not clear-cut . However, if multiple alternatives are not plausible and the item being valued cannot be logically divided into attributes from a design, policy, or consumer-preference perspective, contingent valuation is the appropriate choice. If multiple alternatives and attributes are plausible, the policy question seeks marginal values for individual attributes and such segmentation into attributes is plausible to respondents, a choice experiment would be the appropriate method. A choice experiment with one alternative, two attributes (including cost), and one valuation question only varies from a dichotomous-choice question in the presentation of the change to be valued, and they are conceptually and analytically equivalent. Thus, many of the design insights discussed in this chapter also relate to choice experiments and vice versa for much of the design feature discussion in Chap. 5.