Introduction

Stated choice experiments have proven popular as a framework to seek respondent preferences for various alternatives, especially those that are not currently available in the market. With discrete choice methods, respondent preferences are ‘revealed’ through their responses to a number of scenarios, also known as choice tasks. Each choice task may ask respondents to choose the most preferred alternative, provide a full rank ordering of all alternatives offered, or choose the most and least preferred options (best–worst see Marley and Flynn, in press). Within the setting of utility maximisation (or regret minimisation), these various response metrics are all candidate derivatives of the full rank order.

It has become common practice in stated choice studies to preserve all offered alternatives in a choice set when it comes to model estimation. This includes the recent development of exploded best–worst methods which use a rank order of alternatives in each choice task to create pseudo observations that are then added to the dataset, hence the term exploded. One way to create pseudo observations from full rank order (best–worst) data is to remove the most preferred (best) alternative from each choice task, redefine the least preferred (worst) alternative as the chosen alternative for that choice task and reverse the attribute sign (i.e., replace X with –X) (Rose 2014). The selection of the number of alternatives in a choice set for model estimation is typically based on some arbitrary assumption linked to notions of comprehendability of the choice experiment.

Although we cannot formally identify the optimal number of alternatives that the respondents might consider and the way in which they process the alternatives presented,Footnote 1 we can attempt to gain a better understanding of the extent to which all alternatives are value adding in preference revelation. Conversely, we can investigate whether a subset of the alternatives is all that really matters for an individual when we develop a practical model to predict the chosen (most preferred) alternative. This raises many questions of behavioural curiosity, including whether all alternatives shown in a stated choice experiment should be preserved for each sampled respondent in estimation, or whether some other paradigm of choice set formation might be more reflective of a processing rule adopted by the respondents. This paper is aligned to the broader literature on context effects which examines how preference changes depend on the availability of other options. The context effects include compromise, attraction and similarity (see Rieskamp et al. 2006 and Trueblood et al. 2013; Rooderkerk et al. 2011, Leong and Hensher 2012a, b, 2014 for details). Our research begins with recognising the nature of stated choice (SC) experiments that typically predefine a set of alternatives offered in a choice scenario. As such, choices are partly driven by the context provided by the set of ‘designed’ alternatives. Given the constructed rank order data collected through a best–worst survey, we investigate the role that a specific alternative might have in influencing respondent preferences using a number of testable model specifications (as implied processing strategies, detailed in the following sections). We are interested in establishing which model specification supports the best performance of the model in terms of predicting the most preferred alternative as chosen in each choice task.

The paper is structured as follows. We begin by presenting a number of model specifications designed to represent how an individual might consider or ignore an alternative such that the most preferred alternative in each of the four choice tasks has the highest predicted choice probability (referred to as the ‘compliance test’). We draw on a choice experiment to study car user preferences for alternative road pricing reform packages as a way of empirically identifying the incidence of compliance. The findings obtained from a series of mixed logit models are then presented, followed by interpretation and an inquiry into socioeconomic and contextual factors that may explain, in part, the relative incidence of compliance across the model specifications. The paper concludes with a summary of the main findings and what this means for ongoing discrete choice applications. We include an Appendix (2) as a theoretical perspective offered in the broader literature on choice and uncertainty. The underlying theory is centred on the Axiom of Irrelevance of Statewise Dominated Alternatives (ISDA) which provides an appealing basis for testing the compliance with utility maximisation of each of the model specifications.Footnote 2

Alternative model specifications

Our focus on choice set processing begins with two sequenced choice response questions that elicit the best and worst alternatives in a three-alternative choice scenario. This is a typical forced choice response with no invitation to reject any alternatives through a null choice response. With only three alternatives, the analyst can construct full rank data from such responses. There are a number of possible choice sets that an individual might process depending on the relevance of all or a subset of alternatives presented in a choice scenario. A choice set containing all alternatives would typically assume that respondents consider all offered alternatives when indicating their preferences. Apart from this full choice set, other possible choice sets can be formed for model estimation, depending on an a priori processing strategy adopted by each individual. Given the full rank order of three alternatives considered in this paper,Footnote 3 there are at least four candidate modelling specifications (MSs) that describe processing strategies worthy of consideration:

  1. 1.

    MS1: consider the best alternative in the full choice set of 3 alternatives;

  2. 2.

    MS2: an exploded best–worst choice set. This specification considers the best alternative in the full choice set and the worst alternative of the remaining two options—see further explanation below;

  3. 3.

    MS3: consider the best of the most and second most preferred alternatives; and

  4. 4.

    MS4: consider the best of the most and least preferred alternative.

The four models associated with the four processing rules differ mainly in the number of alternatives and which alternatives are being used to form a choice set. However, they are not different in terms of utility specification.

We can align these MSs to the best–worst literature and use Marley and Flynn’s (in press) notation, to express the four models as follows.Footnote 4 Let Y denote a full choice set with 3 alternatives, ρ = ρ 1, ρ 2 , ρ 3 be a typical rank order of the alternatives in Y from most preferred (best; ρ 1) to least preferred (worst; ρ 3). Note that the respondent is only asked to indicate the best and worst alternatives (in two questions) and is not forced to rank the three alternatives (the full ranking being constructed through simple deduction by the authors). Given the two questions asked to respondents, MS3 corresponds to a behaviourally appealing processing rule where respondents implicitly chose between the best and the second best and ignore the worst. Similarly, MS4 assumes that respondents processed the best and the worst alternatives (for) while ignoring the second best alternative. In selecting a particular alternative as the worst they might be actually rejecting that alternative as relevant, even though they are ‘forced’ to indicate the worst alternative.Footnote 5 The four modelling assumptions are expressed mathematically as follows (noting ρ 1 is best and ρ 3 is worst, \(B_{Y} (y)\) denotes the probability that alternative y is chosen as best in Y, and \(W_{Y} (y)\) is the probability that alternative y is chosen as worst in Y)Footnote 6:

MS1:

$$B_{Y} (\rho_{1} ) = \frac{{\exp (\beta X_{1} )}}{{\sum\nolimits_{j = 1,2,3} {\exp (\beta X_{j} )} }}$$

MS2:

$$B_{Y} (\rho_{1} )W_{{Y - \{ \rho_{1} \} }} (\rho_{3} ) = \frac{{\exp (\beta X_{1} )}}{{\sum\nolimits_{j = 1,2,3} {\exp (\beta X_{j} )} }}\frac{{\exp ( - \beta X_{3} )}}{{\sum\nolimits_{j = 2,3} {\exp ( - \beta X_{j} )} }}$$

MS3:

$$B_{{Y - \{ \rho_{3} \} }} (\rho_{1} ) = B_{{\{ \rho_{1} ,\rho_{2} \} }} (\rho_{1} ) = \frac{{\exp (\beta X_{1} )}}{{\sum\nolimits_{j = 1,2} {\exp (\beta X_{j} )} }}$$

MS4:

$$B_{{Y - \{ \rho_{2} \} }} (\rho_{1} ) = B_{{\{ \rho_{1} ,\rho_{3} \} }} (\rho_{1} ) = \frac{{\exp (\beta X_{1} )}}{{\sum\nolimits_{j = 1,3} {\exp (\beta X_{j} )} }}$$

Importantly, each of the four choice set specifications is an assumed information processing strategy, aligned to the growing literature on attribute processing, which includes consideration of relevant alternatives (see Hensher 2010 for an overview, also Rieskamp et al. 2006; and Leong and Hensher 2012a, b; Hensher and Ho 2014). Hence, the task is to establish under the specific treatment of alternatives (MS1–4), how a choice as a preference ordering (and resulting predicted choice probability ordering) is affected by adding (e.g., MS1) or removing (e.g., MS3) an alternative.Footnote 7 Critically, the alternatives deemed relevant can vary between choice tasks for an individual as a consequence of the combination of attribute levels describing each alternative, or for other less clear (possibly idiosyncratic) reasons.Footnote 8 Separate (mixed logit) models corresponding to different MSs are estimated to obtain not only the overall goodness of fit of each model (as one basis of determining the ‘best’ model in statistical terms) but also the predicted choice probabilities for each alternative under consideration.Footnote 9

In recent years there has been growing interest within the discrete choice framework on asking respondents to select both the best and worst options from a set of alternatives. This literature recognises the additional behavioural information in the best and worst response mechanism (e.g., Flynn et al. 2007; Marley and Pihlens 2012; Collins and Rose 2011; Vermeulen et al. 2010). It is argued by various authors (see Louviere et al. 2013) that best–worst scaling delivers more efficient and richer discrete-choice elicitation than other approaches.

It is expected that the predicted choice probability distribution associated with MS3 will display greater similarity in the pairwise choice probabilities, and thus the model corresponding to MS3 will increase the risk of non-compliance (i.e., a reversal of preference ranking when the worst alternative is removed from the choice set). In contrast, the choice probability ‘gap’ is expected to be greater when the middle alternative is removed, a priori, in the MS4, which will increase the chance of compliance. Where the exploded best–worst formulation fits within the spectrum is unclear; however we speculate (and test below) that it is inferior to both binary formsFootnote 10 and the model for the selection of the best alternative in the full choice set (MS1).

The empirical setting

To investigate the compliance of the four MSs, we use a recent data set collected in Sydney that focussed on investigating preferences for a number of alternative road pricing reform packages for car users. Respondents were shown three alternatives and asked to indicate which one was the best, and which one was the worst.Footnote 11 Given these two responses and a choice task of size three, we are able to identify the full preference ranking of the alternatives. The survey instrument was an online computer assisted personal interview accessed via laptops. Interviewers sat with the respondents to provide any advice that was required in working through the survey, while not offering answers to any of the questions. The data has been used in other papers (e.g., Hensher et al. 2013a, b) but not with the current focus.

The choice experiment consisted of three alternatives: the status quo and two labelled alternatives. The two labelled alternatives represent a cordon-based charging scheme and a distance-based charging scheme, randomly assigned to road pricing schemes 1 and 2. An illustrative choice screen, together with the boundaries of the proposed cordon-based charge area, is presented in Fig. 1. Each alternative was described by attributes representing the average amount of tolls and fuel outlaid weekly, the annual vehicle registration charge, and the allocation of revenues raised to improve public transport, improve and expand upon the existing road network, to reduce income tax, to contribute to general government revenue and to be used to compensate toll road companies for loss of toll revenue. The cordon-based charging scheme and a distance-based alternative were also described by a peak and off peak charge. Both non-status quo alternatives were also described by the year proposed that the scheme would commence.

Fig. 1
figure 1

An illustrative choice screen and the location of the cordon-charge area

A Bayesian D-efficient experimental design was implemented for the study. The design was generated in such a way that the cost related attribute levels for the status quo were first acquired from respondents during preliminary questions in the survey, whilst associated attributes for the cordon-based and distance-based charging schemes were pivoted off of these as minus percentage shifts representing a reduction in such costs for these schemes. Pivoted attributes included average fuel costs and annual registration fees. Fuel costs were reduced by anywhere between zero per cent and 50 per cent of the respondent reported values, either representing no reduction in fuel tax or up to a potential 100 per cent reduction in fuel taxes. Registration fees were reduced to between zero per cent and 100 per cent from the respondent-reported values (see Rose et al. 2008 for a description of pivot type designs). The toll cost was only included in the status quo alternative, being set to zero for the non status quo alternatives since it is replaced by the road pricing regime.

The allocation of revenues raised were fixed for the status quo alternative, but varied in the cordon-based and distance-based charging schemes over choice tasks. The allocation of revenue was varied from zero per cent to 100 per cent for a given revenue stream category. Within a charging scheme, the allocation of revenue was such that the sum had to equal 100 per cent across all possible revenue allocations.

The peak cordon charge varied between $2 and $20, whilst the off-peak cordon charge was varied between $0 and $15. The per kilometre distance-based charge for the peak period ranged from $0.05 to $0.40 per kilometre whilst the off-peak distance-based charge varied between $0 and $0.30 per kilometre. The ranges selected were based on ranges that we believe would contain the most likely levels if implemented (i.e., all reasonable ‘states of nature’). The design was generated in such a way that the peak charges were always equal to or greater than the associated off peak charges. Finally, the cordon-based and distance-based charging schemes were described by the year the scheme would be implemented. In each case, this was varied between 2013 (representing one year from the survey) and 2016 (representing a 4-year delay from the time of the survey).

The attributes and the relevant attribute levels for all alternatives are shown in Table 1. Priors for the design of the choice experiment were obtained from a pilot study of nine respondents collected prior to the main field phase. The final design consisted of 60 choice tasks which were blocked into 15 blocks. The blocking was accomplished by using an algorithm designed to minimise the maximum absolute correlation between the design attributes and the blocking column. Each respondent was shown sequentially four choice scenarios (60/15 = 4) each of three alternatives and asked to indicate their preferences in terms of best and worst scheme. This enables us to create a full rank order of the three alternatives.

Table 1 The Choice experiment attribute levels and range

The evidence

Mixed logit models (summarised in Appendix 1) were estimated in which the parameters associated with road pricing attributes were defined as random parameters with constrained triangular distributions. The overall goodness of fit as the log-likelihood at convergence (with pseudo R2 and Akaike information criteria (AIC) in brackets) for each of the four models is as follows:

  • MS1: −648.77 (0.254, 1.667) for the selection of the best alternative in the full choice set.

  • MS2: −1353.63 (0.525, 1.712) for the exploded best–worst model.

  • MS3: −432.76 (0.496, 1.127) for the most and second most preferred alternatives, and

  • MS4: −338.70 (0.606, 0.892) for the best (most preferred) and worst (least preferred) alternatives.

The evidence suggests that the simple two-alternative choice set form of the best and worst alternatives (MS4) has a statistically better fit over all of the other processing forms. While this might have an element of ‘obviousness’, it is nevertheless an important piece of reinforcing evidence. What is important is that the best and worst alternatives vary across the sample (indeed even between choice sets for each respondent), and hence a choice set studied for a sample might contain many alternatives of which two specific ones are what really matters to a sampled individual evaluating a stated choice set.

Turning to the incidence of compliance, or more generally to an unconditional preservation of preference ordering of alternatives, in our application with four choice tasks per respondent, full compliance would be satisfied if the alternative chosen as the most preferred in each of the four choice tasks has the highest predicted choice probability.Footnote 12 The evidence is summarised in Table 2. This table is a sample wide summary based on choice probabilities predicted by each of the four models corresponding to four MSs. If SP data is to be the basis of application work, then we want to see which MS delivers the ‘best prediction’ in terms of the alternative stated as the most preferred having the highest predicted choice probability. The comparison of one MS against another is undertaken using the compliance test to establish which MS preserves the order (of most preferred) when other alternative choice set formations are considered.

Table 2 Average number of times (out of a max of 4) the alternative with highest, second highest and third highest probabilities is ranked first by each model specification

The empirical evidence suggests that the specification that satisfies the compliance test best (but not 100 % across the sample) is the MS4 binary best–worst form. However, the compliance rate of this MS is not perfect and for the full sample of 200 car users, the average compliance rate is 3.24 out of 4 choice tasks. The MS1 full rank model (2.55/4) is an improvement over MS2, the exploded best–worst model (2.36/4). As might be expected, the best–second best form (MS3) does not satisfy the compliance test to the same extent (2.90/4) as the binary best–worst (MS4). This is intuitively plausible because MS3 narrows the difference in the choice probabilities for whatever behavioural reason (e.g., similarity, attraction), with an expectation that there will be more cases in which the most preferred alternative is no longer the one with the greatest predicted choice probability (i.e., a higher propensity of preference order switching). What this suggests is that a binary form based on the extremes in the ranks is more likely to satisfy the compliance test than any other form assessed (as promoted in Gourville and Soman 2007).

The findings in Table 2 can also be represented in terms of the average predicted probability of the best alternative, summarised in Fig. 2. As expected, given the evidence in Table 2, the mean choice probability associated with the best (most preferred) alternative varies from a high of 0.730 for the MS4, to a low of 0.437 for the exploded best–worst regime (MS2). The range, close to 0.30 probability units, is large and raises concerns about the behavioural validity of the exploded best–worst choice model form (noting however that the evidence is based on only one data set). The second best regime is the best–second best, which has diluted part of the probability gains associated with the best in contrast to the worst alternative, as expected, since the best and second best alternatives are clearly closer in preference space (i.e., utility terms). What is of special interest is the evidence suggesting that the model for the selection of the best alternative in the full choice set (i.e. MS1) performs much better than the exploded best–worst regime (MS2).

Fig. 2
figure 2

Average predicted probability of the most preferred alternative by model specification

A strong message from this evidence, if it can be replicated on a number of data sets, is that a simple binary form (with varying alternatives across the sample) aligns best with the compliance test and the consequent significant improvement in the ability of the choice model to predict a respondent’s most preferred alternative. Whether there are some exogenous influences at work that might increase our understanding of the differences in the evidence on incidence of compliance is investigated in the following section.

Before investigating sources of influence on the incidence of compliance, a summary of an informative behavioural output, namely mean estimates of direct and cross elasticities for the road pricing reform attributes associated with random parameters, is presented in Table 3. Even though the estimates are very small (highly inelastic), the important point to note is that the mean estimates are numerically very different between the four choice processing regimes. The highest mean estimates for direct elasticities are for the model of the best alternative in the full choice set and the preferred choice regime of best vs. worst, although the peak period cordon-based charge has similar direct elasticities between MS3 and MS4. The main message here is not one of the specific numerical estimates, but that the mean estimates (which are commonly used by practitioners) are very different between the four alternative processing regimes. We would argue that compliance might be an appealing guide in selecting the preferred model, and hence the associated elasticity estimates.

Table 3 Direct and cross elasticities under alternative choice set specifications

Systematic factors linked to compliance

Having identified the extent of compliance with ISDA across the sample, it is of value to see if there are some contextual and respondent-specific characteristics, as well as choice experiment features, that have a systematic link to the incidence of compliance. To identify candidate influences, we ran a series of ordered logit models in which the dependent variable was the discrete compliance rate for each respondent. The rate took the values 0–4 (see the frequency distribution given in Fig. 3). The results are summarised in Table 4, with partial (or marginal effects) available on request.

Fig. 3
figure 3

Incidence of compliance by model specification

Table 4 Sources of explanation of incidence of compliance: ordered logit (parameter estimates with t-values in brackets)

A number of variables were identified as statistically significant influences on the incidence of compliance. A particularly interesting result relates to the awareness of the road pricing debate. The positive parameter associated with awareness suggests that the incidence of compliance increases as the perceived awareness response increases. This highlights the role that increased knowledge of the context of the choice experiment plays in ensuring that the stated preference ordering is compliant.

Three socioeconomic characteristics (age, gender and personal income) are statistically significant in one, two or all of the four processing rule forms. As the age of the respondent increases we find, ceteris paribus, that the incidence of compliance increases under the exploded bestworst (MS2) specification but decreases under the bestsecond best choice processing rule (MS3). Personal income is only a statistically significant influence for the bestworst processing rule (MS4), with the negative sign. This suggests, ceteris paribus, that the incidence of compliance decreases as income increases. Gender has a negative influence in all models, suggesting that compliance is reduced for males compared to females. Overall, socioeconomic effects suggest that females on relatively low incomes tend to be associated with a preference ordering in choice experiments that complies better than other socioeconomic classes, with the age effect ambiguous.

The remaining six variables are trip-related attributes. Overall, with one exception (i.e., annual registration fee under bestworst model), the parameter estimates for all cost attributes are positive, suggesting, ceteris paribus, that as the cost of road use increases in the alternatives in the choice set, the compliance of the preference increases. This is, however, tempered when the road pricing reform regime involves a cordon-based charge in contrast to a distance-based charge.

Although these influences are statistically significant, they explain a very small amount of the variation in the incidence of compliance. Thus one might be tempted to conclude, given the available potential sources of systematic influence available in the data, that compliance of preference ordering is aligned strongly with individual-specific idiosyncratic effects which are not revealed.

Conclusions

This paper has investigated a somewhat neglected issue in stated choice experiments; namely the relevance or otherwise of subsets of alternatives that are pre-specified in the design of choice experiments, given an individual’s processing strategy. Although a number of authors such as Hensher (2006) have investigated the implications on attribute processing of the number of alternatives offered in a choice set and concluded that “As we increase the ‘number of alternatives’ to evaluate, ceteris paribus, the importance of considering more attributes increases, as a way of making it easier to differentiate between the alternatives.”, the current paper takes a different approach to identifying the role of alternatives. That is, this paper focuses on the role that each alternative plays in establishing the preference ordering (and the ex post estimation associated choice probability) for an alternative.

Fundamentally, the question of interest is whether there is some redundancy in the offered set of ‘statistically designed’ alternativesFootnote 13 which may ‘get in the way’ of improving the capability of a model to predict the choice of the alternative that is stated as the most preferred (or best) alternative. The empirical analysis used subjective rank ordered preference data on road pricing reform alternatives obtained from a sample of respondents in a predefined stated choice experiment and predefined information processing rules.

Under the four choice making rules (or model specifications) considered, we have identified the extent to which compliance is satisfied. While all model specifications violate a full compliance condition to some extent, the binary bestworst processing strategy produces the most consistent rank ordering. Specifically, we find overwhelming empirical evidence to support a simple bestworst binary form, in contrast to other forms investigated; namely preservation of the (constructed) full rank choice set, the best versus second best, and the exploded bestworst form, the latter promoted in the broader literature of bestworst modelling. The evidence aligns with the contribution of Gourville and Soman (2007) who argue that respondents display an increased tendency to either of the extreme alternatives when the size of the choice set is increased (in our application, admittedly, of only up to three alternatives). Respondents are posited to increasingly rely on an all-or-nothing strategy.

The approach proposed and empirically assessed in this paper might be a strong candidate for establishing the most appropriate processing rule in choice making in future choice studies which can, ex post, be used to jointly estimate process and outcome choices. We encourage further testing using other data sets, especially where there are more than three alternatives offered in a choice experiment.

The findings are of immense relevance to transportation research, especially given the dominant role of SP or SC experiments which show in general a lack of consideration of process heuristics used by individuals when making choices amongst alternatives. In the existing literature, there is a strong (almost over) focus on the niceties of the statistical design of choice experiments (the first author can claim some amount of guilt on this from past contributions). If we are to continue to use SP experiments, then some awareness of process heuristics is really important, and if we can show repeatedly that a specific heuristic is a dominating one (e.g., best and worst only), then we can at least extract the set of parameter estimates that align closely with this.Footnote 14 This is not, we acknowledge, the basis of establishing the relevant choice set to use in model application for prediction and forecasts since this remains a major research and practical challenge when moving from a sample used in modelling to the population as a whole; but it is a good start and throws out the challenge for ongoing research in establishing possible segments of the population that adopt specific processing strategies. Current practices are far from ideal, and hence the need to recognise the signals offered by the evidence herein should be taken on board in research focussed on defining choice sets for population based forecasting.

There is a growing interest in the topic of this paper.Footnote 15 Alberini (2014) has also investigated the gains in statistical efficiency of considering the first and second best alternatives versus the best and worst; and also investigated choices in the presence and absence of a status quo alternative. While that paper is of interest in its own right as a contribution to exploring the role of subsets of alternatives (and a contrast of a single choice vs. offering five choice sets), it differs from our study in that we begin with a full rank order data of three alternatives and focus on investigating inferred processing strategies as guided by a number of model specifications. Hawkins et al. (2014) is another paper with some amount of common focus. They examined best and worst choices using discrete choice tasks, where participants selected either the best option from a set, the worst option, or selected both the best and the worst option. They found that the task (best, worst, or best and worst) does not alter the preferences expressed for the best (respectively, the worst) option, and also observed that the choice probabilities were consistent with a single latent dimension; that is, options that were frequently selected as best were infrequently selected as worst, and vice versa, both within and between respondents. They concluded that the diverse types of best and worst choices studied can be conceived as opposing ends of a single continuous dimension rather than distinct latent entities.

In ongoing research, using a random regret mixed logit model, we are jointly estimating a probabilistic decision rule model that can account for the role of each of the four MSs to see if we can further support the dominance of a specific processing heuristic. The joint estimation approach can replace the separate models for each model specification as well as condition the probability allocation on the socioeconomic and other contextual influences. The ongoing joint estimation is in part an econometric refinement, but it provides an additional basis of establishing the role of each MS in choice set definition at the respondent level.