Introduction

The focus of the majority of discrete choice modelling over the last 10 years has been centred on the random parameter (or mixed logit) model, and more recently on extensions of mixed logit to incorporate scale heterogeneity and estimation in willingness to pay space (see for example Train and Weeks 2005; Fiebig et al. 2010; Greene and Hensher 2010). The family of mixed logit models assume a continuous distribution for the random parameters across the sampled population.

Latent class models offer an alternative perspective, replacing the continuous distribution with a discrete distribution in which preference heterogeneity is captured by membership in distinct classes of utility description (see Greene and Hensher 2003; Shen 2009). Within each class, preference homogeneity is usually assumed, although interactions with observed contextual effects are permissible. A natural extension of the fixed parameter latent class model is a random parameter latent class model which allows for another layer of preference heterogeneity within a class (e.g., Greene and Hensher 2013; Bujosa et al. 2010; Vij 2012; Wasi and Carson 2011). Vij (2012) also introduce error components. Wasi and Carson (2011) use the mixture of normals approach from Train (2008).

A further extension is to overlay attribute processing rules (APR) in a random parameter latent class model that allows for attribute non-attendance (ANA) and aggregation of common-metric attributes (ACMA). Collins et al. (2012) and Hess et al. (2011) have recently implemented this model for ANA. Given that each latent class can be defined by a set of parameter restrictions that represent a particular ANA or ACMA processing rule, we refer to the model as a probabilistic decision process model instead of a latent class model, even though the use of a latent class function (with restrictions) provides the econometric method to identify the role, up to a probability, of each assessed APR, including full attribute attendance (FAA).

The focus in the current paper is on a very specific situation according to which respondents to a stated choice experiment adopt an APR under which specific attributes are either fully attended to, ignored (or “non-attended to” to use the terminology of some environmental economists), and/or are added up where they exhibit a common metric, for all manner of reasons. Earlier efforts of Hensher et al. (2005), Hensher (2006), and Layton and Hensher (2010) highlighted the real possibility that such attribute processing strategies do make a difference in estimates of willingness of pay for specific attributes (e.g., the value of travel time savings). Subsequent research by Hensher and Rose (2009), Hess and Hensher (2010), Scarpa et al. (2009, 2010), Campbell et al. (2008, 2011), Carlsson et al. (2010), and Puckett and Hensher (2008) amongst others, has reinforced the view that accounting for “attribute non-attendance” does impact significantly on key behavioural outputs.

A number of the stated choice studies cited above based their identification of ANA or ACMA on supplementary questions designed to establish whether a respondent had ignored an attribute or not, or added up attributes: they could be asked either after each choice set as in Puckett and Hensher (2008) or after completing all choice scenario assessments, as is more common. However, numerous papers have provided evidence that responses to such supplementary questions may not be reliable (Hess and Rose 2007; Hess and Hensher 2010; Carlsson et al. 2010). Although the jury is still out on this issue, there is growing interest in the approach taken in this paper, i.e., identifying the role of ANA or ACMA through model inference, rather than directly asking each respondent. Some recent examples are Scarpa et al. (2009), Hensher and Greene (2010), Hole (2011), Campbell et al. (2011) and Hensher et al. (2012b); all utilising a latent class structure with fixed parameters.

This paper uses ideas from the studies cited above to investigate the potential value adding contribution of integrating latent class models with random parameter overlays within each class under the assumption of FAA in contrast to mixtures of FAA, ANA and ACMA. Although Hess et al. (2011) comes closest to investigating similar issues (although focussing only on ANA), Hess et al. impose strong restrictions on the parameters identified, namely equality constraints across classes. In the current paper we impose no such constraints and seek out the role that random preference heterogeneity plays in the presence and absence of accounting for specific APRs. The findings provide evidence on whether additional random parameter layering within a latent class framework that accounts for ANA and ACMA is significant, or whether it is a potentially confounding source. This paper also adds to the growing evidence that the handling of ANA with latent classes and fixed parameters is susceptible to exaggerating the ANA rate by capturing low taste sensitivities as ANA (Campbell et al. 2011; Hess et al. 2011; Collins et al. 2012).

We illustrate its application using a stated choice data set in the context of car commuters and non-commuters choosing amongst alternative packages of travel times and costs (including tolls) pivoted around a recent trip in Australia. Although we investigated more than one data set, we focus herein on one very substantive data set which represents the evidence obtained using other data sets.

A random parameter probabilistic decision process model with FAA, ANA and ACMA

Latent class modelling provides an alternative approach to accommodating heterogeneity in contrast to models such as multinomial logit and mixed logit (see Everitt 1988). The natural approach assumes that parameter vectors, β i are distributed among individuals with a discrete distribution, rather than the continuous distribution that lies behind the mixed (or random parameter) logit model. In our adaptation of the latent class model, it is assumed that the population consists of a finite number, Q, of groups of individuals, where each class is pre-defined by an APR. The groups are heterogeneous, with common parameters, β q , for the members of the group, but the groups themselves are different from one another in line with the APR. We assume that the classes are distinguished by the different parameter vectors (which can, but do not necessarily have to be, equality constrained),Footnote 1 though the fundamental data generating process, the probability density for the variable under study, is the same.

The analyst does not know from the data which observation is in which APR class, even if the specific attribute processing rule is imposed on a class, hence the term latent classes. The model assumes that individuals are distributed heterogeneously with a discrete distribution in a population. The full specification of the latent class structure for a generic data generating process is (Greene and Hensher 2003):

$$ f\left( {y_{i} |{\mathbf{x}}_{i} ,{\text{class }} = q} \right) = g(y_{i} |{\mathbf{x}}_{i} ,\;\varvec{\beta}_{q} ) $$
(1)
$$ {\text{Prob}}\left( {{\text{class}} = q} \right) = \pi_{q} (\varvec{\theta}),\quad q = 1 ,\ldots ,Q. $$
(2)

θ is the vector of latent class parameters attached to candidate sources of systematic influence on class membership. The unconditional probability attached to an observation is obtained by integrating out the heterogeneity due to the distribution across classes, given in (3).

$$ f\left( {y_{i} |{\mathbf{x}}_{i} } \right) = \Upsigma_{\text{q}} \;\pi_{\text{q}} (\varvec{\theta})g(y_{i} |{\mathbf{x}}_{i} ,\;\varvec{\beta}_{q} ). $$
(3)

To extend the latent class model to allow for two layers of heterogeneity (and the imposition of an APR within a class) both within and across groups, we allow for continuous variation of the parameters within classes (Greene and Hensher 2013). The latent class feature of the model is given as (4) and (5).

$$ f\left( {y_{i} |{\mathbf{x}}_{i} ,{\text{class }} = q_{APR} } \right) = g(y_{i} |{\mathbf{x}}_{i} ,\,\varvec{\beta}_{{i|q_{APR} }} ) $$
(4)
$$ {\text{Prob}}\left( {{\text{class }} = q_{APR} } \right) \, = \;\;\pi_{q} (\varvec{\theta}),\quad q_{APR} = { 1}, \ldots ,Q_{APR} . $$
(5)

The within-class heterogeneity, \( {\mathbf{w}}_{i} , \) is structured as

$$ \varvec{\beta}_{{i|q_{APR} }} = \;\;\varvec{\beta}_{{q_{APR} }} + {\mathbf{w}}_{{i|q_{APR} }} $$
(6)
$$ {\mathbf{w}}_{{i|q_{APR} }} \sim E[{\mathbf{w}}_{{i|q_{APR} }} |{\mathbf{X}}\left] {\; = \; {\mathbf{0}}, {\text{Var}}} \right[{\mathbf{w}}_{{i|q_{APR} }} |{\mathbf{X}}] = \Upsigma_{{{\text{q}}_{APR} }} $$
(7)

where the X indicates that \( {\text{w}}_{{i|q_{APR} }} \) is uncorrelated with all exogenous data in the sample. We will assume below that the underlying distribution for the within-class heterogeneity is normal, with mean 0 and covariance matrix Σ. In a given application, it may be appropriate to further assume that certain rows and corresponding columns of \( \Upsigma_{{q_{{_{APR} }} }} \) equal zero, indicating that the variation of the corresponding parameter is entirely across classes. With random parameters the estimator does a lot of simulation. There is one random parameter model in each class, so, for example, if the model is specified with three random parameters in the model and three latent classes, there are nine random parameters; consequently computation time increases significantly. Overlaying this specification is the APR for each class.

The extension to incorporate ANA and ACMA is relatively simple, in a conceptual sense. To account for possible heuristics defined in the domain of ANA and ACMA, we impose restrictions on the mean and standard deviation parameters defining each random parameter within each latent class, each class representing a particular process heuristic. For example, to impose the condition of non-attendance of a specific attribute, we set its parameter to zero; to impose ACMA we constrain the parameters of two or more attributes to be equal.

The contribution of individual i to the log likelihood for the model is obtained for each individual in the sample by integrating out the within-class heterogeneity associated with continuous random parameters and ANA and ACMA restrictions on the mean and standard deviation of the relevant random parameters, and then the class heterogeneity. We allow for a stated choice data setting, hence the observed vector of outcomes is denoted y i and the observed data on exogenous variables are collected in \( {\mathbf{X}}_{i} = \, [{\mathbf{X}}_{i 1} ,..,{\mathbf{X}}_{iTi} ] \). An individual is assumed to engage in Ti choice situations, where Ti ≥ 1. The generic model is given in (8).

$$ \begin{gathered} f({\mathbf{y}}_{i} |{\mathbf{X}}_{i} ,\;\varvec{\beta}_{ 1} , \ldots ,\;\varvec{\beta}_{{Q_{APR} }} ,\;\varvec{\theta}\;,\varvec{\Upsigma}_{ 1} , \ldots ,\varvec{\Upsigma}_{{Q_{APR} }} ) \hfill \\ = \sum\limits_{{q_{APR} = 1}}^{{Q_{{_{APR} }} }} {\pi_{{q_{APR} }} (\theta )\int\limits_{{w_{i} }} {\mathop {\mathop \prod \limits^{{T_{i} }} }\limits_{t = 1} f[\varvec{y}_{it} |(\varvec{\beta}_{{q_{{_{APR} }} }} + \varvec{w}_{i} ),\varvec{\text{X}}_{it} ]h(\varvec{w}_{i} |\varvec{\Upsigma}_{{q_{APR} }} )d\varvec{w}_{i} } \,} \hfill \\ \end{gathered} $$
(8)

We parameterise the class probabilities using a multinomial logit formulation to impose the adding up and positivity restrictions on π\( _{\text{qANA}} ({{\uptheta}}) \). Thus, the class probability model becomes:

$$ \pi_{{q_{APR} }} (\varvec{\theta}) = \frac{{\exp ({{\uptheta}}_{{q_{APR} }} )}}{{\Upsigma_{{q_{APR} = 1}}^{{Q_{APR} }} \exp (\theta_{{q_{APR} }} )}}{\kern 1pt} ,\quad q_{APR} = { 1}, \ldots ,Q_{APR} ; \theta_{{Q_{APR} }} = \, 0. $$
(9)

The model employed in this application is a probabilistic decision process latent class, mixed multinomial logit model with FAA, ANA and ACMA. Individual i chooses among J alternatives with conditional probabilities given as (10).

$$ f[\varvec{y}_{it} |(\varvec{\beta}_{{q_{{_{APR} }} }} +\varvec{w}_{i} ),\varvec{X}_{it} ] = \frac{{\exp \left[\Upsigma_{j= 1}^{J} y_{it,j} (\varvec{\beta}_{{q_{{_{APR} }} }} +\varvec{w}_{i} )'\varvec{x}_{it,j} \right]}}{{\sum_{j = 1}^{J}\exp \left[(\varvec{\beta}_{{q_{{_{APR} }} }} + \varvec{w}_{i})'\varvec{x}_{it,j} \right]}}\quad ,\;j = { 1}, \ldots ,J, $$
(10)

\( y_{it,j} = { 1} \) for the j corresponding to the alternative chosen and 0 for all others, and x it,j is the vector of attributes of alternative j for individual i in choice situation t. We use maximum simulated likelihood to evaluate the terms in the log likelihood expression. The contribution of individual i to the simulated log likelihood is the log of equation (11).

$$ \begin{gathered} f^{S} ({\mathbf{y}}_{i} |{\mathbf{X}}_{i} ,\varvec{\beta}_{ 1} , \ldots ,\;\varvec{\beta}_{{Q_{APR} }} ,\varvec{\theta},\varvec{\Upsigma}_{ 1} , \ldots ,\varvec{\Upsigma}_{{Q_{APR} }} ) \hfill \\ = \sum\limits_{{q_{APR} = 1}}^{{Q_{{_{APR} }} }} {\pi_{{q_{APR} }} (\varvec{\theta})\frac{1}{R}\sum\limits_{r = 1}^{R} {\prod\limits_{t = 1}^{{T_{i} }} {f[\varvec{y}_{it} |(\varvec{\beta}_{{q_{APR} }} + \varvec{w}_{i,r} ),{\mathbf{X}}_{it} ]} } } \hfill \\ \end{gathered} $$
(11)

\( {\mathbf{w}}_{i,r} \) is the rth of R random draws on the random vector \( {\mathbf{w}}_{i} \). Collecting all terms, the simulated log likelihood is given as (12).

$$ \log L^{S} = \sum\limits_{i = 1}^{N} {\log \left[ {\sum\limits_{{q_{{_{APR} }} = 1}}^{{Q_{{_{APR} }} }} {\pi_{q} (\varvec{\theta})\frac{1}{R}\sum\limits_{r = 1}^{R} {\prod\limits_{t = 1}^{{T_{i} }} {f[\varvec{y}_{it} |(\varvec{\beta}_{{q_{{_{APR} }} }} + \varvec{w}_{i,r} ),{\mathbf{X}}_{it} ]} } } } \right]} $$
(12)

In the presence of ANA and ACMA, it differs from more familiar formulations of latent class models in that the nonzero elements in β q can be allowed to be the same or free across the classes, and the classes have specific behavioural meaning, as opposed to merely being groupings defined on the basis of responses, as in the strict latent class formulation, hence the reference to a probabilistic decision process model. Estimation of the probabilistic decision process model is straightforward as a latent class MNL model with linear constraints on the coefficients, as suggested above.

Willingness to pay (WTP) estimates for the value of total travel time savings ($/hour) are computed using the familiar result, WTP = −β time/β cost. Since there is heterogeneity of the parameters within the classes as well as across classes, the result is averaged to produce an overall estimate. The averaging is undertaken for the random parameters within each class then again across classes using the posterior probabilities as weights. Collecting the results, the procedure is

$$ \begin{aligned} {WTP} & = \frac{1}{N}\sum\limits_{i = 1}^{N} {\sum\limits_{{q_{APR} = 1}}^{{Q_{APR} }} {\left\{ {\pi_{{q_{APR} }} \left( {\hat{\theta }} \right)|i} \right\}} } \left[ {\frac{{\frac{1}{R}\sum\nolimits_{r = 1}^{R} {L_{{ir|q_{APR} }} \frac{{ - \hat{\beta }_{{time,ir|q_{APR} }} }}{{\hat{\beta }_{{\cos t,ir|q_{APR} }} }}} }}{{\frac{1}{R}\sum\nolimits_{r = 1}^{R} {L_{ir|q} } }}} \right] \, \\ \, & { = }\frac{1}{N}\sum\limits_{i = 1}^{N} {\sum\limits_{{q_{APR} = 1}}^{{Q_{APR} }} {\left\{ {\pi_{{q_{APR} }} \left( {\hat{\theta }} \right)|i} \right\}} } \frac{1}{R}\sum\limits_{r = 1}^{R} {W_{{ir|q_{APR} }} {{WTP_{{time,ir|q_{APR} }} }}} , \\ \end{aligned} $$
(13)

where R is the number of draws in the simulation and r indexes the draws, \( \hat{\beta }_{{time,ir|q_{APR} }} = \hat{\beta }_{{time|q_{APR} }} + \hat{\sigma }_{{time|q_{APR} }} w_{{time,ir|q_{APR} }} \) and likewise for \( \hat{\beta }_{{\cos t,ir|q_{APR} }} , \, L_{{ir|q_{APR} }} \) is the contribution of individual i to the class specific likelihood - this is the product term that appears in (11) and (12), and \( \pi_{{q_{APR} }} \left( {\hat{\theta }} \right)|i \) is the estimated posterior class probability for individual i,

$$ \pi_{{q_{APR} }} \left( {\hat{\theta }} \right)|i = \frac{{\pi_{{q_{APR} }} (\hat{\varvec{\theta }})\frac{1}{R}\sum\nolimits_{r = 1}^{R} {\prod\nolimits_{t = 1}^{{T_{i} }} {f[\varvec{y}_{it} |(\hat{\varvec{\beta }}_{{q_{{_{APR} }} }} + \varvec{w}_{i,r} ),{\mathbf{X}}_{it} ]} } }}{{\sum\nolimits_{{q_{{_{APR} }} = 1}}^{{Q_{{_{APR} }} }} {\pi_{{q_{APR} }} (\hat{\varvec{\theta }})\frac{1}{R}\sum\nolimits_{r = 1}^{R} {\prod\nolimits_{t = 1}^{{T_{i} }} {f[\varvec{y}_{it} |(\hat{\varvec{\beta }}_{{q_{{_{APR} }} }} + \varvec{w}_{i,r} ),{\mathbf{X}}_{it} ]} } } }}. $$
(14)

Our model bears some resemblance to Hess et al. (2011), and Collins et al. (2012), but there are some key differences. These papers impose equality constraints on the fixed and random parameters across the classes, and allow for between class variation solely through constraining these parameters to zero, to represent ANA. This goes some way to reducing the proliferation of parameters as the number of classes increases.Footnote 2 The number of class allocation parameters will increase exponentially as the number of classes increases. Hess et al. (2011) ameliorates this problem through an alternative, more parsimonious class allocation model, first employed by Hole (2011) in the context of fixed parameters. Collins et al. (2012) present a more flexible class allocation model, with the conventional latent class allocation model and the Hole (2011) approach serving as extremes. The intermediate specifications allow the independence of ANA assumption on which the Hole (2011) approach relies to be selectively relaxed, albeit at a cost in parameters.

Application

The data are drawn from a study undertaken in Australia in the context of toll versus free roads, which utilised a stated choice (SC) experiment involving two SC alternatives (i.e., route A and route B), which are pivoted around the knowledge base of travellers (i.e., the current trip). The trip attributes associated with each route are summarised in Table 1. The data have been used in a number of other applications (e.g., Li et al. 2010; Hensher et al. 2012a), but not in the methodological context of the current study.

Table 1 Trip attributes in stated choice design

Each alternative has three travel scenarios—‘arriving x minutes earlier than expected’, ‘arriving y minutes later than expected’, and ‘arriving at the time expected’. Each is associated with a corresponding probabilityFootnote 3 of occurrence to indicate that travel time is not fixed but varies from trip to trip. For all attributes except the toll cost, minutes arriving early and late, and the probabilities of arriving on-time, early or late, the values for the SC alternatives are variations around the values for the current trip. Given the lack of exposure to tolls for many travellers in the study catchment area, the toll levels are fixed over a range, varying from no toll to $4.20, with the upper limit determined by the trip length of the sampled trip.

In the choice experiment, the first alternative is described by attribute levels associated with a recent trip; with the levels of each attribute for Routes A and B pivoted around the corresponding level of the actual trip alternative with the probabilities of arriving early, on time and late provided. Commuters and non-commuters in a Metropolitan area in Australia were sampled (with employer-related business trips also included in the full data set but excluded from the analysis in this paperFootnote 4). A telephone call was used to establish eligible participants from households. During the telephone call, a time and location were agreed for a face-to-face Computer Aided Personal Interview (CAPI). In total, 588 commuters and non-commuters (with less than 120 min’ trip length) were sampled for this study, each responding to 16 choice sets (games), resulting in 9,408 observations for model estimation. The experimental design method of D-efficiency used herein is specifically structured to increase the statistical performance of the models with smaller samples than are required for other less-efficient (statistically) designs such as orthogonal designs (see Rose et al. 2008). An illustrative choice scenario is given in Fig. 1.

Fig. 1
figure 1

An illustrative choice scenario

Results

The findings are presented in Table 2 for four models. The first and third models assume full attribute attendance, while the second and fourth models allow for mixtures of FAA, ANA and ACMA. The first two models assume fixed parameters for all attributes, while the third and fourth models include random parameters for the travel time and cost attributes. The random parameters are defined by a constrained triangular distribution with scale parameter equal to the mean estimate.Footnote 5

Table 2 Summary of models

The development of the models follows a natural sequence from Model 1 through to Model 4. Under the assumption of FAA, selecting the number of classes is explained below, along lines well recognised in the latent class literature. We estimated model 1 under alternative numbers of classes (ranging from two through to seven classes), with five classes having the best overall goodness-of-fit (including AIC). When we move to Model 2, which assumes fixed parameters, but introduces ANA and ACMA, we have to define the ANA and ACMA classes and investigate how many classes should remain as FAA. In earlier studies (e.g., Scarpa et al. 2009; Campbell et al. 2011; McNair et al. 2012), users of latent class models (including Hensher and Greene 2010) imposed only one FAA class when investigating attribute processing rules; however there are good reasons why a number of classes might be considered (just as in Model 1) given that taste heterogeneity can continue to exist between FAA classes in the presence of elements of attribute processing. The introduction of multiple FAA classes may also go some way to reducing the chance that the attribute processing classes end up capturing taste heterogeneity as well as attribute processing, which is a risk when the taste coefficients are not constrained across classes. Model 2 is the final model under fixed parameters, varying the number of FAA classes and specific ANA and ACMA classes. Model 3 overlays Model 1 with random parameters on the four time and cost attributes of the choice experiment; however the number of classes is reduced to four on overall goodness-of-fit. Estimating this model takes many hours. Finally we introduce Model 4, which builds on all previous models, and is also freely defined in terms of the number of FAA classes, the ANA and ACMA classes and the distributional assumptions imposed on each random parameter.Footnote 6 Estimation of Model 4 also takes many hours with many models failing to converge (see footnote 5). The number of FAA classes under random parameters is reduced to one compared to three under fixed parameters (Model 2), possibly suggesting that some amount of preference heterogeneity that was accommodated through three classes under fixed parameters in model 2 has been captured in a single class in model 4 through within class preference heterogeneity. The final four models reported below are the outputs of this process.

A question naturally arises, how can the analyst determine the number of classes, Q? Since Q is not a free parameter, a likelihood ratio test is not appropriate, though, in fact, log L will increase when Q increases. Researchers typically use an information criterion, such as AIC, to guide them toward the appropriate value. For Model 1, the AIC was the lowest for five classes, at 1.035 (log-likelihood of −4817.72); whereas for Model 3 with random parameters, we found four classes had the lowest AIC (1.033 and a log-likelihood of −4803.2, slightly better than Model 1). Heckman and Singer (1984) also suggest a practical guidepost in selecting the number of classes; namely that if the model is fit with too many classes, estimates will become imprecise, even varying wildly. Signature features of a model that has been overfit include exceedingly small estimates of the class probabilities, wild values of the structural parameters, and huge estimated standard errors. For the models that account for ANA and ACMA (Models 2 and 4), the number of classes is pre-defined by the number of restrictions on parameters that are imposed to distinguish the attribute processing strategies of interest; however the number of classes with full attribute attendance is free and can be determined along the same lines as Models 1 and 3.

With respect to the ANA and ACMA conditions that might be imposed, authors have suggested that responses to supplementary questions on whether a respondent claims they ignore specific attributes and/or added them up may be useful to signal the possibility of specific attribute processing strategies. For the sample of 588 observations, the following incidence of reported ANA is obtained: free flow time (28 %), slowed down and stop start time (27 %), running cost (17 %), and toll cost (11 %). The incidence of ACMA is as follows: total time (60.5 %) and total cost (80 %). The reliability of such data has been questioned in many papers (see e.g. Hess and Hensher 2010); however, despite the concerns about the believability of such evidence, there is a case to support the presence of heterogeneity in attribute processing. The above incidence rates of ANA motivate the selection of the restrictions imposed on the models that account for ANA and ACMA; although the final classes were based on extensive investigation of alternative restrictions and show that the link with the stated supplementary question responses is tenuous (in line with the general trend in the literature). The number of classes in Models 2 and 4 are determined by the number of attribute processing rules of interest plus an assessment of the number of FAA classes using AIC as a statistical guide, as well as the suggestions of Heckman and Singer.

The incorporation of the ANA and ACMA attribute processing rules (Model 2 and 4) increases the overall goodness-of-fit compared to the model without allowance for attribute processing. The overall log-likelihood improves from −4817.72 (Model 1) to −4711.59 (Model 2) under fixed parameters, and from −4803.2 (Model 3)Footnote 7 to −4705.2 (Model 4) under random parameters. On the AIC test, adjusted for sample size, it improves from 1.035 to 1.013 from Model 1 to Model 2 and 1.033 to 1.010 respectively for Models 3 and 4. Model 4 is only a marginal improvement over Model 2 on overall fit, and after extensive investigation of possible reasons (including changing the number of Halton draws from 250 up to 1500 with 250 increments), we could not find any circumstance under which Model 4 performs considerably better than Model 2. What this suggests to us is that the added layer of behavioural complexity to allow for taste heterogeneity may indeed be adding little once attribute processing is accounted for. This finding is reinforced by the worse fit for Model 3 with random parameters in the absence of attribute processing (−4803.2) in contrast to Model 2 with attribute processing under fixed parameters (−4711.59). It is noteworthy that including three FAA classes in Model 2 is also a way of capturing (discrete) random preference heterogeneity in a probabilistic decision process model, which introduces useful behavioural information that aligns to a comparison of two alternative specifications of random preference heterogeneity. What this indicates is an expectation that Model 2 may be capturing some amount of the random preference heterogeneity that Model 4 is seeking to reveal, which was not obtained in Model 4 through multiple classes of FAA.

When we consider the class allocation, up to a probability, some interesting findings emerge. Class membership probabilities are statistically significant in all models, with a good spread of membership in all models. A comparison of Models 2 and 4 is especially interesting, given the difference in the treatment of the parameters (fixed or random) under a common set of attribute processing rules. Introducing random parameters to account for taste heterogeneity with a class of probabilistic decision rules reduces the probability of membership of the ACMA class from 0.322 to 0.116. A closer look at the classes representing ANA shows an increase in a move back to full attribute attendance under a random parameter specification from 0.571 (the sum of three FAA classes) to 0.656, and an increase in membership probabilities for the ANA classes. This might suggest that an amount of attribute processing (both ANA and ACMA) is being accommodated through random parameters embedded in FAA. Specifically, one implication is that a low marginal disutility associated with attributes that are otherwise assigned a zero value in ANA are instead associated with a very low marginal disutility and appear in FAA; and small differences in marginal disutility are revealed under ACMA in contrast to equal marginal disutility. There does however remain a sizeable (but smaller) incidence of ANA and ACMA.

Willingness to pay (WTP) estimates for the value of total travel time savings ($/hour) obtained for all four models are summarised in Table 3 and in Fig. 2. The averaging is undertaken for the random parameters as per formula (13). We find that the mean estimates increase as we account for attribute processing. On a test of statistical differences of VTTS estimates, the z values are greater than 1.96 (ranging from 13.72 for M1 vs. M3 to 31.7 for M3 vs. M4), except for the comparison of Models 2 and 4 (z = 0.85). Thus we can conclude that adding a layer of random parameters to the model that accounts for FAA, ANA and ACMA does not result in a statistically significant difference in the mean estimate of VTTS (M2 vs. M4); however this is not the situation for comparisons between the fixed parameter models M1 and M2, or between the random parameters models M3 and M4, where attribute processing clearly influences VTTS in an upward direction.

Table 3 Willingness to pay estimates weighted average VTTS (2008 AUD$/person hour) using weights for components of time and components of cost (standard deviations in brackets)
Fig. 2
figure 2

Distribution of VTTS for all models

We also observe that the incorporation of attribute processing reduces the standard deviation of the VTTS quite considerably for both the fixed and random parameter models, as well as increasing the mean estimate of VTTS. What this suggests is that it is the allowance for attribute processing, and not the allowance for preference heterogeneity within classes through random parameters, that is the key influence on the higher mean estimate of VTTS and accompanying lower standard deviation. Model 3 is of particular interest, since it suggests that in the absence of allowance for FAA, ANA and ACMA, the mean estimate of VTTS is significantly deflated but with an inflated standard deviation when preference heterogeneity through random parameters is accommodated.

Conclusions

This paper has introduced a generalisation of the fixed parameter latent class model through a layering of random parameters within each class, and the redefinition of classes as probabilistic decision rules associated with two specific attribute processing rules. We implement this extended model structure in the context of a toll verses free road choice setting and estimate four models as a way of seeking an understanding of the role of attribute processing in the presence of fixed or random parameters within each probabilistic decision rule class.

What we find, for the data set analysed, is that if attribute processing is handled through discrete distributions defined in a sufficiently flexible way, adding an additional layer of taste heterogeneity through random parameters within a latent class delivers only a very small improvement in the statistical and behavioural contribution of the model. The flexibility is achieved by not equality constraining coefficients across classes, and, crucially, allowing multiple FAA classes to be specified. Compared to the random parameter approach, this model is simple, fast to estimate, and in the empirical setting presented herein, very close in model fit to the model that included continuously distributed random parameters.

One implication is that a random parameter treatment in this setting may be confounding with attribute processing; and that including attribute processing in the absence of continuously distributed random parameters is preferred to including continuously distributed random parameters in the absence of attribute processing. This is an important finding that might suggest the role that attribute processing rules play in accommodating attribute heterogeneity, and that random parameters within class are essentially a potential confounding effect. We offer this finding as a potential concern, conditioned on evidence from one data set, about the possible presence of an identification problem when attribute processing and random parameters within-class are simultaneously considered.

Despite the marginal influence of preference heterogeneityFootnote 8 in the overall fit of the models, we find potentially important behavioural evidence to suggest that inclusion of random parameters may be a way of accommodating small marginal disutilities (in contrast to ANA set equal to zero marginal disutility), and small differences in marginal disutilities (in contrast to equal marginal disutilities under ACMA), as observed by a ‘move back’ to FAA when fixed parameters become random parameters under attribute processing. If this argument has merit, we may have identified one way of recognising what the broader literature (e.g., Hess et al. 2011; Campbell et al. 2011) refers to as low sensitivity in contrast to zero sensitivity.

The findings in this paper are specific to the data set being analysedFootnote 9; however like any empirical study there is a need to assess the findings and conclusions on a number of data sets. We encourage the research community to undertake this task, not only for the attribute processing strategies we have assessed, but a broader set of heuristics on how attributes are processed. There is the possibility that our findings might be different for different data sets; this is however not a concern about our evidence, but rather a reminder that behavioural processes are often context dependent. If additional studies support the evidence herein on many occasions, then there is a case for recognising the practical value of selecting a latent class framework with fixed parameters for attribute processing, given that inclusion of random parameters adds very little in terms of predictive performance while adding significant complexity in estimation.

Other authors have recently used the latent class structure to compare processing heterogeneity with regard to several types of behavioural processes, with other types of heterogeneity (e.g., scale see Thiene et al. 2012 and taste, see Hess et al. 2012). Although they deal with different decision processes and use different model specifications, they offer general findings on the confounding issue that is discussed in this paper. They propose like us, a latent class (or probabilistic decision process) approach with some conditions imposed on classes to reflect a decision process. They then layer additional heterogeneity on top (random taste or scale) to establish the robustness of both the specifications of heterogeneity, and the alternative model specifications that represent the different decision processes. They conclude that the LC approach has great merit as a framework within which to represent multiple decision processes with and without a random parameter treatment.