1 Introduction

There is substantial variation in the ecological indicators used to represent similar outcomes within stated preference (SP) surveys (Schultz et al. 2012). Unanticipated sensitivity of welfare estimates to these variations could threaten the perceived reliability of SP welfare estimation for policy analysis. Specifically, if estimated willingness to pay (WTP) for a particular ecological outcome were to vary depending on the specific indicator used to characterize that outcome, the appropriate value to use within cost benefit analysis could be unclear. Moreover, the resulting welfare estimates could be viewed as dependent mostly on researcher discretion (i.e., the particular set of indicators used to characterize a given outcome within a survey instrument).

Despite the relevance of these concerns for welfare analysis, the literature provides little theoretical or empirical guidance regarding whether and how welfare estimates should respond to variations in ecological indicators within SP survey design. This paper models and evaluates the typically unacknowledged implications of these variations for SP welfare estimation, addressing the common situation in which alternative indicators may be chosen by the researcher to communicate change in the same presumptive ecological outcome. We consider two primary questions related to the consequences of such choices: (1) Conceptually, when should SP welfare estimates be robust to changes in ecological indicators used to characterize the same policy outcomes? (2) Empirically, when they are expected to be so, are estimates indeed robust to indicator choice?

1.1 Ecological Indicators and Valued Outcomes in Stated Preference Valuation

The ecological literature has long accepted that many outcomes such as wildlife abundance, biodiversity and water quality cannot be directly observed or adequately quantified using a single metric, and must be communicated via indicators.Footnote 1 Multiple indicators can often be used to communicate the same or similar ecological outcomes, or changes in valued ecosystem goods and services. For example, the condition of a wildlife species might be represented using indicators of species abundance (e.g., population size, number of breeding pairs), frequency (e.g., likelihood of encountering the species in a given area, frequency of successful breeding), habitat distribution, population viability, or classified status (e.g., threatened, endangered).Footnote 2 Because cognitive limitations constrain the number of attributes that respondents can consider simultaneously within a survey scenario (DeShazo and Fermo 2002), SP survey designers must choose a small set of such indicators to communicate policy outcomes. Even after substantial attention to survey design and testing (e.g., within focus groups and cognitive interviews (Johnston et al. 1995; Kaplowitz et al. 2004; Powe 2007), multiple indicators representing identical or similar outcomes may remain under consideration.Footnote 3

The resulting diversity in survey design possibilities prompts concern over whether alternative indicators communicate the status of identical or related but distinct ecological outcomes. That is, respondents’ assumed relationship between indicators and outcomes is crucial to understanding what is being valued within a SP survey, and how this relates to policy outcomes in question and the potential robustness of welfare estimates. In some cases, for example, indicators such as species abundance and frequency might represent alternative means to communicate the same underlying outcome, e.g., species condition. In other cases, such as when values are driven by preferences for recreational encounters, these indicators might represent distinct outcomes. In the latter case, the outcome valued by respondents—and hence WTP—will likely depend on which of these indicators is used to characterize policy outcomes. Such challenges are particularly germane for valuation of policies affecting ecological systems, as these policies often affect dozens of commonly-used ecological indicators, even when the set of primary welfare-relevant outcomes is small.

To clarify subsequent discussion, we define outcome equivalent indicators as those that communicate the status of identical underlying outcomes, as perceived by respondents. When two ecological indicators are outcome equivalent, they may in principle be used interchangeably within a SP survey, subject to other indicator properties such as simplicity and ease of communication. That is, one should be able to interchange outcome equivalent indicators within a SP survey without significant effects on welfare estimates. To our knowledge no prior research has demonstrated robustness of welfare estimates to indicator choice in such circumstances.

To the contrary, some prior research suggests that WTP is sensitive to alternative ecological indicators within SP scenarios. Jacobsen et al. (2008), for example, find that WTP estimates in biodiversity valuation differ depending on whether outcomes are characterized in terms of effects on habitat indicators versus indicator species. When interpreting such results, however, it is important to consider whether alternative indicators are outcome equivalent. Does changing ecological indicators in a SP survey influence the perceived outcome being valued by respondents? If so, WTP variability would be expected. That is, when two or more ecological indicators are not outcome equivalent, exchanging these indicators within a survey scenario will lead to a change in the underlying outcome being valued, and an attendant change in WTP.

This paper proposes a model and linked approach to SP design that formalizes the concept of outcome equivalence and enables the evaluation of empirical consequences (e.g., for welfare estimates). Illustrative models are estimated and hypothesis tests implemented using choice experiments applied to migratory fish restoration in a Rhode Island (USA) watershed. Results demonstrate that SP welfare estimates can be statistically robust to the use of alternative outcome equivalent ecological indicators within survey scenarios. To our knowledge, this is the first published convergent validity Footnote 4 test of independent SP welfare estimates derived using distinct ecological indicators that characterize the same underlying outcomes. Our findings reinforce the reliability of SP models for ecological policy analysis, and suggest that the sensitivity of welfare estimates to the choice of ecological indicators in prior studies may have been due to changes in the outcomes that were valued by respondents.

2 Theoretical Model of Indicator Selection and Outcome Equivalence

Despite the relevance of ecological indicator use and selection for SP welfare estimation, their implications have been largely overlooked by the valuation literature.Footnote 5 Most analyses appear to treat choices among alternative indicators as inconsequential, providing little insight into the process of indicator selection and mapping between indicators and ecological outcomes. This relatively casual treatment of distinctions between indicators and welfare-relevant policy outcomes is somewhat justified, at least in a superficial sense. If indicators within SP surveys are adequately defined and respondents have well-defined preferences, then respondents should be able to make preference-revealing choices; their responses are expected to accurately reflect their interpretation of valuation scenarios (assuming there are no strategic or other biases).

Below the surface of these simple assumptions, however, there may be potential ambiguity about what is being valued by respondents. Scenario interpretation may depart from that which the survey designer intends (Carson 1998; Tonsor 2011), in part because of unclear relationships between indicators and outcomes. For example, researchers presenting a certain set of ecological indicators to respondents might erroneously presume that estimated WTP reflects change to an underlying ecological outcome \(Q_{1}\), whereas it may reflect assumed changes to other outcomes (e.g., \(Q_{2}\) or \(Q_{3})\). Similarly, it may be unclear whether two indicators \(W_{1}\) and \(W_{2}\) represent the same outcome \(Q_{1}\) (in which case only one should likely be used within a SP scenario) or distinct outcomes \(Q_{1}\) and \(Q_{2}\) (in which case both might be included to ensure comprehensive welfare estimates). This lack of clarity can engender uncertainty regarding which welfare estimates should be used in cost benefit analysis and whether SP results are sufficiently reliable for policy evaluation.

In this way, the conceptual issue is somewhat related to the concepts of embedding and scope (e.g., Carson and Mitchell 1995; Carson et al. 2001; Jakobsson and Dragun 2001; Powe and Bateman 2004; Veisten et al. 2004). Embedding refers to the issue of whether alternative goods represent different quantities of the same underlying good (i.e. they are nested subsets) or instead represent related but not formally nested goods (Carson and Mitchell 1995), whereas scope refers to the perceived quantity of a single good (Powe and Bateman 2004). Carson and Mitchell (1995) demonstrate that while formal nesting of goods may be straightforward in theory, in practice it may be unclear whether two different measures indeed represent different quantities (or scopes) of the same good. Here the question is similar: whether different indicators represent the same good, albeit perhaps at different scopes, or instead represent similar but distinct goods. As with the matter of embedding, the issue is straightforward in theory but more opaque in practice.

To formalize these issues, we begin with a simple theoretical framework in which utility, \(U(\cdot )\), is a function of two ecological public goods or outcomes that cannot be directly observed, \(Q_{1}\) and \(Q_{2}\), and a numéraire private good \(X\) (e.g., income). Hence, \(U(\cdot ) = U({Q}_{1},Q_{2}, X)\). We assume that the goal of valuation is to establish marginal WTP for changes in \(Q_{1}\). Respondents infer the quantity or quality of the outcome via a relationship between the outcome and an ecological indicator represented as a simple function \(Q_{1} =f(W_{1A})\), where \(W_{1A}\) is an ecological indicator used to communicate the status of \(Q_{1}\). Alternative indicators \(W_{1B}\) or \(W_{1C}\) also represent changes in \(Q_{1}\), where \(Q_{1} = g(W_{1B})\) and \(Q_{1} = h(W_{1C})\). That is, \(W_{1A}, W_{1B}\) or \(W_{1C}\) may all be used to communicate the status of otherwise unobservable outcome \(Q_{1}\).Footnote 6

While outcome \(Q_{1}\) may be communicated adequately using only one indicator, we assume that outcome \(Q_{2}\) requires two distinct ecological indicators, \(W_{1C}\) and \(W_{2A}\). That is, \(Q_{2} = k(W_{1C}, W_{2A})\), where \(W_{1C}\) is the same indicator identified in \(Q_{1} = h(W_{1C})\) above. We emphasize that \(f(\cdot ), g(\cdot ), h(\cdot )\) and \(k(\cdot )\) are assumed relationships used by respondents to infer change in outcomes \(Q_{1}\) and \(Q_{2}\), based on ecological indicators \(W_{1A}, W_{1B}, W_{1C}\), and \(W_{2A}\). They are not “ecological production functions” in the strict sense (cf. Johnston and Russell 2011), because the outcomes are not produced by the associated indicators.

The use of indicator \(W_{1A}\) in SP scenarios to estimate welfare change associated with a change in \(Q_{1}\) is grounded in the underlying theoretical relationship

$$\begin{aligned} \frac{\partial U(\cdot )}{\partial W_{1A} }=\left( \frac{\partial U(\cdot )}{\partial Q_1 } \right)\left( \frac{\partial Q_1 }{\partial W_{1A} } \right) \end{aligned}$$
(1)

The first term in (1) may be derived from survey responses (e.g., the coefficient estimate on \(W_{1A} \)within a choice experiment). The second term reflects the underlying marginal influence of \(Q_{1}\) on utility. The third term represents the relationship between the indicator (\(W_{1A} )\) and the otherwise unobservable outcome (\(Q_{1})\) assumed by respondents; this may be informed by information provided to respondents within the survey but is also influenced by respondents’ prior perceptions and understanding. A parallel structure applies for indicator \(W_{1B}\). That is, \(W_{1A}\) and \(W_{1B}\) are outcome equivalent; they are alternative means to communicate the same underlying outcome and are linked to \({\partial U(\cdot )}/{\partial Q_1 }\) in the same structural manner. Estimated utility change \(({\partial U(\cdot )}/{\partial Q_1 })\) from a change in ecological outcome \(Q_{1}\)—and hence WTP for this change—should be equivalent, whether evaluated using either \(W_{1A}\) or \(W_{1B}\) (one would not use both indicators simultaneously, because they communicate equivalent information). This provides the answer to the conceptual question presented in the previous section (i.e., when should SP welfare estimates be robust to changes in ecological indicators used to characterize policy outcomes?).

Note that this equivalence does not imply that the corresponding WTP associated with the indicators \(W_{1A}\) or \(W_{1B}\) will be identical, because it will generally be the case that

$$\begin{aligned} {\partial Q_1 }/{\partial W_{1A} }\ne {\partial Q_1 }/{\partial W_{1B} }, \end{aligned}$$
(2)

so that

$$\begin{aligned} {\partial U(\cdot )}/{\partial W_{1A} }\ne {\partial U(\cdot )}/{\partial W_{1B} } \end{aligned}$$
(3)

That is, the functional relationship between the \(W_{1A} \) and the utility-relevant but unobservable outcome \(Q_1 \) is not necessarily the same as that between \(W_{1B} \) and \(Q_1 \), so that the per unit utility gain associated with \(W_{1A}\) and \(W_{1B}\) is not necessarily identical, even though they are both linked to \(Q_1 \). Footnote 7

The situation is more complex if \(W_{1C}\) is instead used as an indicator of \(Q_{1}\). In this case,

$$\begin{aligned} \frac{dU(\cdot )}{dW_{1C} }=\left( \frac{\partial U(\cdot )}{\partial Q_1 } \right)\left( \frac{\partial Q_1 }{\partial W_{1C} } \right)+\left(\frac{\partial U(\cdot )}{\partial Q_2 } \right)\left( \frac{\partial Q_2 }{\partial W_{1C} } \right) \end{aligned}$$
(4)

Welfare changes that correspond with changes in \(W_{1C}\) will capture both the relationship between \(W_{1C}\) and \(Q_{1}\) and between \(W_{1C}\) and \(Q_{2}\). If the latter effect is not anticipated, the result will be a misleading perspective of the value of \(Q_{1}\). That is, \(W_{1A}\) and \(W_{1C}\), while both indicators of \(Q_{1}\), are not outcome equivalent because \(W_{1C}\) is also an indicator of outcome \(Q_{2}\). Hence, the interchange of \(W_{1A}\) or \(W_{1C}\) in a survey scenario influences the underlying outcomes that are valued.

The applied valuation literature largely overlooks such concerns, proceeding as if ecological indicators necessarily represent the outcomes to be valued. This assumption is equivalent to a special case in which \(f(\cdot ), g(\cdot )\), and \(h(\cdot )\) are identity functions, such that \(W_{1A}= W_{1B}=W_{1C}=Q_{1}\). In the general case, the outcomes inferred by respondents when viewing a particular ecological indicator may be ex ante unclear, and can only be revealed through focus groups, pretests and other best practices in survey development that clarify distinctions between ecological conditions (or processes), indicators and values. For example, in the case of threatened wildlife, do respondents truly and directly value a change in official species status (e.g., endangered to threatened), or does this status convey an underlying but unobservable public good over which respondents have preferences (e.g., the likelihood that a species will not become extinct)? The few past analyses that demonstrate sensitivity of WTP to the communication of ecological outcomes (e.g., Jacobsen et al. 2008) may have inadvertently substituted indicators that were not outcome equivalent, highlighting the importance of more formal treatment of such relationships.

It is also important to emphasize that outcome equivalence as defined here, while related to ecological properties, is a function of respondents’ preferences and perceptions. Ecological properties alone cannot determine whether respondents interpret two (or more) indicators as referencing an identical underlying good. Hence, while ecological relationships provide a necessary condition for outcome equivalence (i.e., the indicators in question must have some similarity in terms of the outcomes to which they refer, cf. Schiller et al. 2001), they do not provide a sufficient condition. The latter requires that respondents use the alternative indicators (e.g., \(W_{1A}\) and \(W_{1B})\) to infer change in the same element in their utility functions (i.e., \(Q_{1})\).

3 Formalizing Indicator Selection in Practice

Lacking attention to the problem, the literature also lacks assurance that there is a workable solution. Is there any evidence that WTP estimates are robust to the alternative use of outcome equivalent ecological indicators in SP scenarios? For example, assume that survey designers have identified outcome equivalent indicators that might be used to characterize a policy outcome, and that design constraints prevent the use of both indicators simultaneously. Are such indicators interchangeable within an SP survey without attendant changes in welfare estimates? Or, will any—even seemingly immaterial—change in ecological indicators influence welfare estimates, leading to a potentially problematic sensitivity of WTP?

We address this issue from two perspectives. First, we present a structured process designed to formalize and clarify the selection of ecological indicators within SP design. This process provides a practical means to address relationships between indicators and underlying goods, as well as distinctions between ecological outcomes or processes, indicators of those outcomes, and economic values. Second, we present a case study to illustrate the empirical results of such a process and to test outcome equivalence for the resulting set of indicators.

Like all forms of SP content validity (Mitchell and Carson 1989; Bateman et al. 2002), the adequacy ecological information within survey design is context specific. For example, the interpretation of an ecological indicator may vary between two valuation contexts.Footnote 8 Because of this, a systematic, context-specific process of indicator selection is required to ensure that indicators reflect utility-relevant outcomes and to evaluate whether alternative indicators are outcome equivalent. Without such a process, there is no way to determine whether any given indicator, regardless of its ecological properties, successfully communicates a particular outcome.

Here, we summarize the process of ecological indicator selection developed for the present case study. While not the only possible means to select indicators within SP design, the proposed methods provide a formal approach that links ecological science to public preferences, and informs linkages between valued outcomes and indicators. The process included four steps:

  1. 1.

    Focus groups and cognitive interviews (Kaplowitz et al. 2004) were first used to identify primary, welfare-relevant outcomes (here resulting from fish restoration in Rhode Island). These initial focus groups and interviews were designed to elicit ways in which respondents valued potential policy outcomes. These outcomes were grouped into categories of welfare-relevant goods and services (e.g., effect on migratory fish, quantity of river habitat restored, etc.), using ethnographic focus group methods outlined by Johnston et al. (1995).

  2. 2.

    Based on focus group and interview results, these goods and services were further disaggregated into final and intermediate outcomes (goods and services). Final goods and services are defined as outcomes that directly enhance respondents’ utility, whereas intermediate goods and services are defined as inputs into the biophysical production of final goods and services; they have no direct influence on utility (Johnston and Russell 2011).Footnote 9 Following Johnston et al. (2012), only final goods and services were considered for inclusion in survey scenarios. These are akin to assessment endpoints in the ecological literature—often unobservable policy goals that influence well-being or affect utility (US EPA 1998).

  3. 3.

    Following Schiller et al. (2001), each assessment endpoint was formally linked to one or more measurement endpoints; these are observable ecological indicators used within formal frameworks to communicate, infer, or predict changes in assessment endpoints (US EPA 1998). The result was a conceptual model relating each assessment endpoint (final good or service) to a set of ecologically-linked measurement endpoints (indicators). This step required review of the ecological literature and extensive input from experts in fish restoration ecology. Guidance described in Johnston et al. (2012) was also applied to ensure that candidate indicators had appropriate empirical properties.

  4. 4.

    A sequence of preliminary SP surveys was then developed, grounded in information provided by steps one through three above. Multiple survey variants were tested, each including a different sets of measurement endpoints (indicators) for relevant assessment endpoints. These surveys were evaluated and pretested in a second round of focus groups and interviews. Within these evaluations, direct questions assessed whether respondents perceived alternative indicators as communicating identical or distinct underlying outcomes (e.g., whether each indicator provided unique information concerning valued goods and services, or whether multiple indicators provided the same information in different ways). When indicators were perceived as communicating identical outcomes, subsequent questions were used to select indicators most easily understood by respondents. Following Schiller et al. (2001) and Johnston et al. (1995), focus groups and interviews were also used to identify the shared, common language best able to communicate indicators.

This four-step process provided justification for the set of indicators included in final survey scenarios, providing evidence (albeit qualitative) of outcome equivalence. Although this process provided unambiguous guidance regarding most indicators, it also revealed a case in which two indicators (i.e., of effects on migratory fish) appeared to be outcome equivalent, and in which respondents did not express a clear preference for either indicator. The final sections of this paper evaluate whether welfare estimates are robust to the alternative use of these two seemingly outcome equivalent indicators within survey scenarios.

4 Case Study of Outcome Equivalence and Welfare Estimation

We address the empirical question of whether SP welfare estimates are robust to choices between putatively commodity-equivalent indicators using a case study of migratory fish restoration in the Pawtuxet Watershed of Rhode Island, USA. At the time of this study the watershed provided no spawning habitat for migratory fish; access to all 4,347 acres of potential habitat was blocked by 22 dams (Erkan 2002).Footnote 10 Restoration of fish passage would not only affect fish populations but also other ecosystem outcomes that rely on the presence or abundance of migratory fish. Species that directly benefit from fish passage restoration in this area are alewife (Alosa pseudoharengus), blueback herring (A. aestivalis), shad (A. sapidissima), and American eel (Anguilla rostrata). The choice experiment questionnaire (Rhode Island Rivers: Migratory Fishes and Dams) estimated the WTP of Rhode Island residents for options that would restore fish passage to between 225 and 900 acres of historical habitat.

Description of the choice experiment is condensed from Johnston et al. (2011); Johnston et al. (2012). The theoretical model was adapted from a standard choice experiment specification in which household \(h\) chooses among three policy plans, \((j=A,B,N)\), including two multi-attribute restoration options \((A, B)\) and a status quo \((N)\) of no restoration. Choice scenarios and restoration options were informed by data and restoration priorities in the Strategic Plan for the Restoration of Anadromous Fishes to Rhode Island Coastal Streams (Erkan 2002). Consistent with the strategic plan, the restoration methods included fish ladders and lifts (Schilt 2007) that neither require dam removal nor would cause appreciable changes in river flows. Information on the ecological roles of the migratory species targeted for restoration (e.g., Loesch 1987) provided the basis for conceptual models linking restoration to valued outcomes identified in focus groups.

The questionnaire was developed and tested over 2\(\tfrac{1}{2}\) years in a collaborative process involving economists and ecologists. This included meetings with resource managers, natural scientists, and stakeholder groups, and 12 focus groups. We conducted cognitive interviews (Kaplowitz et al. 2004), verbal protocols (Schkade and Payne 1994) and other pretests, both to collect information prior to survey design and to gain insight into respondents’ interpretation of the questionnaire. Survey language and graphics were pretested carefully to ensure respondent comprehension. Particular attention was given to the definition and interpretation of ecological indicators. Prior to presenting choice questions, the survey provided information (1) describing the status of Rhode Island river ecology and migratory fish compared to historical baselines, (2) characterizing affected ecological systems and linkages, (3) describing fish passage restoration, and (4) providing definitions, derivations and interpretations of ecological indicators used in survey scenarios. Information was conveyed via a combination of text, graphics including Geographic Information System (GIS) maps and ecosystem representations, and photographs, all of which were subject to in-depth pretesting. The survey included a number of elements to promote incentive compatibility, including an emphasis on consequentiality, hypothetically binding payments, and instructions to consider each question as an independent choice.Footnote 11

4.1 Ecological Indicators of Fish Passage Outcomes

Based on the process of indicator selection outlined above, choice options were characterized by five ecological indicators, one attribute characterizing public access, and one attribute characterizing annual household cost (Table 1). As described above, ecological indicators serving as attributes in the choice model were selected based on a conceptual model that coordinated ecological science with findings from focus groups. The initial direct ecological effect of restoration is to provide migratory fish with access to additional habitat for spawning and is quantified by the attribute acres, based upon restorable Pawtuxet watershed habitat acreage (Erkan 2002). The consequences of greater habitat acreage include increases in both migrating populations and the probability that fish runs will exist in a given area at some future time period. Alternative attributes used to characterize these impacts are the focus of this analysis; these are discussed below. Other indirect impacts include effects on (1) the abundance of non-migratory fish suitable for recreational harvest (catch), reflecting abundance measures from statewide sampling; (2) the abundance of fish-dependent wildlife (wildlife), reflecting the appearance of identifiable species within restored areas; and (3) overall ecological condition (IBI), reflecting the output of a multimetric aquatic ecological condition score [i.e., index of biotic integrity; see Johnston et al. (2011) and Johnston et al. (in press) for additional discussion].

Table 1 Choice experiment attributes and descriptive statistics

The issue of outcome equivalence arose in survey design when evaluating alternative ecological indicators that could be used to characterize the impacts of restoration on migratory fish. Focus groups revealed that respondents valued, and were willing to pay for enhancements to migratory fish species in the Pawtuxet Watershed, ceteris paribus. The specific indicator(s) best suited to quantify this effect, however, were not clear. The ecological literature provides multiple indicators that might be used to quantify effects on migratory fish populations. One indicator is the estimated probability that the restored fish run will exist in 50 years (or an alternative time period), reflecting results calculable through applications of population viability analysis (Lee and Rieman 1997). Another indicator is the number of individual fish passing through fishways (i.e., the channels through which fish pass over or around dams), observed using electronic or visual counts of migrating fish (Erkan 2002). These alternative attributes are entitled PVA and migrants, respectively (Table 1).

Survey pretests with both individuals and focus groupsFootnote 12 supported the inclusion of either PVA and migrants in survey scenarios, but not both. This preference among respondents was due to two factors. First, PVA and migrants were seen to provide similar information (i.e., effects on migratory fish). Second, choice questions including both of these indicators were more difficult to answer due to the increased number of attributes. While expressing a preference for surveys including either PVA or migrants, however, respondents did not have a clear preference for one of these indicators over the other. Both were viewed as satisfactory ways to communicate effects on migratory fish, and focus group pretests suggested little effect on the choice frame when these indicators were alternated within the survey design.

Given these pretest results, we implemented two independent versions of the final survey. These were identical in all regards, except for the indicator used to characterize effects on migratory fish and associated explanatory text. One survey version included PVA, and the other included migrants. That is, each respondent received only one version of the survey, presenting information on either PVA or migrants, but not both. This enables a direct test of outcome equivalence—whether results are unaffected by this choice of indicators, both of which were deemed satisfactory by survey pretests. To be clear, there is no expectation that implicit prices (marginal WTP for attribute changes, ceteris paribus) will be equal for PVA or migrants, even in the presence of outcome equivalence. This is because the function linking these attributes to the putative underlying outcome (migratory fish) differs [see Eqs. (2) and (3)]. Rather, hypothesis tests consider a more comprehensive set of results from the choice experiment. These include evaluations of (1) potential differences in implicit prices for other model attributes, (2) compensating surplus (CS) measures for restoration programs considered as a whole, and (3) welfare measures calibrated using known ecological relationships between the two indicators.

4.2 Choice Experiment Implementation

Attribute levels within the experimental design (Table 2) were grounded in feasible restoration outcomes identified by ecological models, field studies and expert consultations. Choice scenarios represented each ecological attribute in relative terms with regard to upper and lower reference conditions (i.e., best and worst possible in the Pawtuxet) as defined in survey materials. Relative scores represented percent progress toward the upper reference condition (100 %), starting from the lower reference condition (0 %). Scenarios also presented the cardinal basis for relative scores where applicable. Sample choice questions are illustrated in Fig. 1 for the survey version including migrants, and by Fig. 2 for the survey version including PVA.

Table 2 Attribute levels in choice experiment design
Fig. 1
figure 1

Sample choice experiment question (survey version including migrants)

Fig. 2
figure 2

Sample choice experiment question (survey version including PVA)

A fractional factorial experimental design was generated using a criterion that minimized D-error for a choice model covariance matrix with both main effects and selected two-way interactions (Kuhfeld 2010; Kuhfeld and Tobias 2005).Footnote 13 The final design included 180 profiles blocked into 60 booklets. Each respondent was provided with three choice questions and instructed to consider each as an independent, non-additive choice. Surveys were implemented using a dual wave phone-mail approach during June–August, 2008. An initial random digit dial sample of Rhode Island households was contacted via telephone and asked to participate in a survey addressing Rhode Island “environmental issues and government programs.” Those agreeing to participate were sent the questionnaire via mail, with follow-up mailings to increase response rates (Dillman 2000). Respondents were either sent the survey version including migrants or that including PVA, but not both (i.e., the samples are independent). For the two survey versions analyzed here, a total of 1,200 questionnaires were sent to Rhode Island residents. These yielded 564 usable returns (47 %), providing provide 1,634 completed responses to choice questions. Response rates were similar across both survey versions.

4.3 Model Specification and Estimation

The random utility models are estimated using simulated likelihood mixed logit (ML) with Halton draws, accounting for correlations in choices from the same respondent. The final model specifications were chosen after the estimation of preliminary models with varying specifications of fixed and random coefficients. Within the final models, coefficients on all non-cost attributes except catch are specified as random with a normal distribution.Footnote 14 The coefficient on (sign-reversed) cost is random with a bounded triangular distribution, ensuring positive marginal utility of income (Hensher and Greene 2003). All variables except access and cost reflect represent percent progress towards the upper reference condition (Table 1), so that the coefficient on ecological attribute \(k,\hat{\beta }_k\), reflects the relative marginal utility given to a one percentage point change in that attribute.Footnote 15

Two final, jointly estimated models are illustrated; these are referenced by subscripts \(p\) (pooled) and \(u\) (unrestricted).Footnote 16 The first model pools observations from the two choice experiments, imposing identical coefficient estimates over all attributes, including the same coefficient on PVA and migrants. This is accomplished through the creation of a single variable, fish, that pools observations on PVA and migrants across the two choice experiments (Table 1). The pooled model thereby ignores differences in the relative scale of change across the two fish attributes,Footnote 17 imposing an identical coefficient and marginalutility \((\hat{\beta }_{fish,p} )\). Other attributes are identical across the two choice experiments and are pooled directly.

The second model is an unrestricted model that allows systematically varying coefficient estimates on all non-cost attributes between the migrants and PVA choice experiments. This is accomplished through inclusion of multiplicative interactions between each attribute and a dummy variable d_mig that identifies observations from the migrants choice experiment.Footnote 18 Hence, within the unrestricted model, the marginal utility of attribute \(k\) in the PVA choice experiment is given by \(\hat{\beta }_{k,u}\), whereas the marginal utility of the same attribute in the migrants choice experiment is given by \((\hat{\beta }_{k,u} +\hat{\beta }_{k\times d\_mig,u} )\). The marginal utility of PVA itself is given by \(\hat{\beta }_{fish,u} \), and the marginal utility of migrants is given by \((\hat{\beta }_{fish,u} +\hat{\beta }_{fish\times d\_mig,u} )\).

5 Results

Results for the two models are reported in Table 3. Both models are statistically significant at \({p}<0.0001\), with pseudo-\(\text{ R}^{2}\) statistics in excess of 0.31. Coefficient estimates on all ecological attributes, with the exception of catch, are statistically significant at \({p}<0.01\) in all models. Signs of statistically significant coefficients match prior expectations. Prior works discuss the general properties and policy relevance of similar model results (Johnston et al. 2011; Johnston et al. 2012); these topics are not repeated here. Rather, we focus on an evaluation of outcome equivalence and implications for policy analysis.

Table 3 Mixed logit results: Pawtuxet restoration choice experiments

A likelihood ratio test comparing the unrestricted and pooled models fails to show the statistical significance of the imposed restrictions (\(\chi ^{2} = 4.51, df = 7, p = 0.72\)). As a result, we cannot reject the pooled model that imposes identical coefficient estimates across the two choice experiments.Footnote 19 Similarly, none of the individual coefficient estimates distinguishing the two models \((\hat{\beta }_{k\times d\_mig,u} )\) are statistically significant at \({p}<0.10\). Hence, at first glance, the choice between migrants and PVA to represent impacts on migratory fish does not appear to influence model results. Such findings provide initial support for an outcome equivalence hypothesis, and are robust across a wide range of alternative model specifications (including models with alternative specifications and distributions of fixed and random parameters).

As implied by Eqs. (2) and (3) above, however, direct comparison of the coefficient estimates on migrants and PVA is not necessarily informative; there is no a priori reason that these estimates should be identical, even if the two attributes are outcome equivalent. Moreover, the potentially confounding role of the logit scale parameter prevents naïve comparison of individual coefficient estimates across models (Swait and Louviere 1993). Due to these limitations, we continue the analysis through an evaluation of implicit prices and CS estimates (which are more directly relevant for policy and not confounded by scale), combined with welfare comparisons that calibrate for differences in ecological scale between migrants and PVA.

5.1 Test of Outcome Equivalence: Implicit Prices

Drawing from unrestricted model coefficients, implicit prices are estimated using the welfare simulation described by Johnston and Duke (2007), following the general approach of Hensher and Greene (2003).Footnote 20 Presented WTP estimates reflect the mean over the parameter simulation of mean WTP calculated over the coefficient simulation. Given the model specification, implicit prices are calculated for the PVA choice experiment using the general form \({\hat{\beta }_{k,u}}/{\hat{\beta }_{cost,u} }\), and for the migrants choice experiment using \({(\hat{\beta }_{k,u} +\hat{\beta }_{k\times d\_mig,u} )}/{\hat{\beta }_{cost,u} }\).

Results are shown in Table 4. For all attributes except access, implicit price results are interpreted as WTP for a marginal, one percentage point increase in the attribute, holding all else constant. For access, results indicate WTP for the provision of public access in the restored area, relative to the default of no access. The rightmost column of Table 4 illustrates the difference between implicit prices estimated from the PVA and migrant models. We calculate statistical significance levels (\(p\) values for two-tailed tests) for each of the implicit prices and differences. Significance levels are determined through percentiles on the empirical welfare distributions (Poe et al. 2005), with these distributions accounting both for sampling variation reflected in the estimated covariance matrix for model parameters and the estimated distribution of random coefficients (Hensher and Greene 2003).

Table 4 Implicit price differences: PVA and migrants choice experiments

Results in Table 4 demonstrate that (a) implicit prices vary across distinct ecological effects within each choice experiment,Footnote 21 but (b) corresponding implicit prices are statistically indistinguishable across the two choice experiments. For all directly comparable implicit prices across the two choice experiments, we fail to reject the null hypothesis of equivalent implicit prices.

Comparisons involving implicit prices for the attributes PVA and migrants themselves require additional ecological assumptions. The relationship between the number of migratory fish (migrants) in the Pawtuxet watershed and population viability (PVA) is conditional on other ecological factors, such as conditions at sea for this migratory species. This caveat aside, attribute levels in the experimental design are conditioned upon an expected 2.37 to 1 ratio, on average, between improvements in population viability and the number of migratory fish.Footnote 22 This expectation reflects a monotonically additive relationship (Maunder 2004), based on ecological data for the watershed and consultations with experts in regional migratory fish restoration.

Based on this relationship, and assuming that these indicators are outcome equivalent, one would expect a corresponding but inverse ratio in the implicit prices, with respondents willing to pay approximately 2.37 times more for marginal changes in migrants than for parallel changes in PVA, in percentage point terms. That is, if these two indicators reflect alternative ways of measuring underlying effects on migratory fish, albeit at different scales, one would expect the WTP ratio for these attributes to be the inverse of the ecological ratio. Empirical results cannot reject this expectation. Based on a comparison of empirical distributions, we fail to reject the null hypothesis (\(p=0.28\)) that the implicit price ratio between migrants and PVA is equal to 2.37.

5.2 Test of Outcome Equivalence: Compensating Surplus

Compensating surplus (CS) measures reflect per household WTP for entire restoration programs that combine outcomes for different attributes. As cost benefit analyses commonly rely on these measures, they arguably provide the most relevant perspective on WTP robustness (Morrison et al. 2002). To assess CS differences, we follow common practice and select a small set of illustrative policy alternatives from thousands of feasible scenarios that can be generated using attribute levels of the experimental design. We choose three illustrative scenarios representing: (1) the smallest positive increase in each ecological outcome; (2) a median increase in each outcome; and, (3) the largest increase in each ecological outcome. CS for each of these three alternatives is calculated both with and without public access, for a total of six policy scenarios. Following Morrison et al. (2002)), all CS estimates incorporate (the negative of) the status quo ASC. When specifying comparable levels of migrants and PVA across the two models, we assume the same ecological relationship between these attributes that underpins attribute levels in the experimental design. This enables a comparison of CS for equivalent policies across the two models. Empirical distributions are simulated following the a parallel approach to that used for implicit prices above.

Results are shown in Table 5. In addition to the attribute levels defining each policy scenario, Table 5 presents estimated CS for both the PVA and migrants choice experiments, as well as the difference between these estimates both in cardinal (dollar) and absolute value percentage terms. In all cases, we fail to reject the null hypothesis of zero CS difference. Absolute value percentage differences vary from 1.89 to 7.83 %, with a mean of 4.71 %, indicating nearly identical CS estimates across the two choice experiments. These findings again suggest statistical robustness across the two independent choice experiments, and support the hypothesis of outcome equivalence. Average CS estimates do not vary significantly when alternating between indicators of population size (migrants) and viability (PVA) within the choice experiment, suggesting that respondents viewed and valued policy scenarios similarly regardless of which indicator was used to characterize direct effects on migratory fish. This occurs despite the fact that the two indicators are measured at different scales, with PVA effects ranging from 30 to 70 %, and migrant effects ranging from 12 to 33 % (approximately 150,000–395,000 fish) from equivalent restoration actions.

Table 5 Compensating surplus (CS) estimates and differences: Pawtuxet restoration scenarios

6 Conclusion

Compared to prior works in the ecological valuation literature, this analysis promotes a more formal perspective on ecological indicator selection within SP design, specifically invoking the concept of outcome equivalence to characterize cases in which ecological indicators provide alternative means to communicate an identical underlying outcome. The robustness of welfare estimates to ecological indicators is critically important for the use of SP analysis to guide policy: if not robust, welfare estimates could be regarded (or disregarded) as idiosyncratic to the particular set of indicators chosen to communicate a policy outcome. Indeed, when indicators are not outcome equivalent welfare estimates are unlikely to converge. Previous findings that WTP differs when ecological attributes differ are not particularly surprising given that these studies employed indicators that are not prima facie outcome equivalent.

We demonstrate that welfare estimates converge when outcome equivalent indicators are used. In contrast to previous studies, the present analysis evaluates welfare implications using independent choice experiments in which effects on an ecological outcome are characterized using two indicators that are expected to be outcome equivalent based on input during survey design. Our findings provide a promising message for SP analysis, and to our knowledge are the first published results demonstrating the convergent validity of SP welfare estimates to alternative uses of outcome equivalent ecological indicators. Welfare estimates are statistically robust whether one characterizes impacts in terms of the size of migrating fish populations (migrants) or population viability (PVA). We find no significant differences in either implicit prices or CS estimates. Moreover, implicit prices for migrants and PVA are consistent with the underlying ecological relationship between the two attributes implied by the experimental design. As a result, WTP for a given change in migratory fish—estimated using independent choice experiments—is similar whether the change is measured in terms of population size or viability. This suggests that respondents were able to use either ecological indicator to make equivalent assessments of the welfare gains from migratory fish restoration. If future research establishes similar findings elsewhere, the result should be greater confidence that SP welfare estimates are not overly sensitive to the particular set of ecological indicators used in the questionnaire.

This paper addresses only a few of the many challenges involved in comprehensive coordination of economics and ecology for welfare estimation. Although results here suggest that SP welfare estimates can be robust to the use of alternative ecological indicators within survey scenarios, greater attention is needed to relationships between indicators in survey scenarios and policy outcomes for which welfare estimates are desired. Despite an awareness of challenges involved in communicating ecological changes within SP surveys, the scrutiny given to indicator use and interpretation within the ecological literature has not been matched in the economics literature. The process used to select indicators within survey design is often opaque, and implications for welfare estimates remain obscured. The result can be a lack of transparency in both the ecological outcomes that respondents are being asked to value and ways in which these relate to ecological information presented on the survey page.