Introduction

Travel behavior varies due to various factors, some of which are unobservable. For example, it is usually difficult to observe psychological factors such as motives, attitudes and perceptions, which are particularly influential to the decision-making process, even for variations caused by widely addressed individual attributes. Accordingly, it becomes important to explore how many variations due to individual attributes can be observed and how many can not be observed, from both data collection and modeling perspectives. Such a discussion can also be applied to other types of variations, including household attributes, spatial attributes and temporal factors.

In fact, a number of studies have already focused on different kinds of variations, and the importance of distinguishing between these variations has been argued (Kitamura 2003). In particular, the distinction between inter-individual variation and intra-individual variation has been intensively examined (e.g., Pas 1987; Hanson and Huff 1988; Kitamura et al. 2006). In addition to inter-individual and intra-individual variations, Pas and Sundar (1995) and Goulias (2002) empirically showed the existence of inter-household variations, which account for around 1/3 of all inter-individual variations in trip generation and time use. Here, the three types of variation (intra-individual, inter-individual and inter-household) hold a nesting hierarchical structure. Concretely speaking, an observed behavior in a certain situation (intra-individual) is nested within an individual (inter-individual), and an individual is nested within a household (inter-household). Since a model usually tries to structuralize variations in data, such a hierarchical point of view is quite important to identify whether the variations come from situational differences (i.e., intra-individual variations) or from individual differences (i.e., inter-individual variations). To capture such different levels of variations simultaneously, a multilevel modeling approach (Goldstein 2003) has become more and more popular, and several applications can be found in the field of transportation (e.g., Bhat and Zhao 2002; Goulias 2002; Weber and Kwan 2003; Kim et al. 2004; Khandker et al. 2006; Habib and Miller 2008; Susilo and Dijst 2009). A random coefficient model with panel data can also be regarded as a special kind of multilevel model in which intra-individual and inter-individual variations are usually distinguished. Cirillo and Axhausen (2006) distinguished between these two variations in a mode choice context, and Cherchi and Cirillo (2008) extended it to different hierarchical levels, including inter-tour, inter-day, inter-day of week, inter-individual and inter-household variations.

At the same time, there are many cases when a certain level unit can be nested within more than one higher-level unit, i.e., some variation types do not have a hierarchical structure. For example, Bhat (2000) focused on work-travel mode choice and the fact that the choice is nested not only within the home location (i.e., the origin), but also within the work location (i.e., the destination). Obviously, home and work locations do not have a hierarchical structure. Here we call such a non-hierarchical structure a cross-classified structure. In other examples, spatial variations and systematic day-to-day variations (i.e., day effects at an aggregate level) may also have a cross-classified structure. An individual can move in and out of different origin and destination pairs, while an origin and destination pair can be moved by different individuals; the relationship between individuals and space therefore also follows a cross-classified structure. Such complicated cross-classified variation structures have been examined simultaneously by Chikaraishi et al. (2009, 2010), with a focus on departure time choice and time use behavior. However, in the standard multilevel cross-classified model, we have to assume that all variations at the different cross-classified levels are independent of each other. When this assumption is not true, co-variations, which are variations jointly caused by two or more unobserved factors, would occur. Note that co-variations only occur in cross-classified structures (i.e., not in hierarchical structures). For example, travel choice behavior might depend not only on a person’s individual attributes (such as gender, age and work style), but also on the origin and destination (with respect to travel time and travel cost), and these two effects might be correlated in some cases. For example, responses to travel time and travel cost may differ across individuals. Variations caused by interactions between different cross-classified levels are called co-variations. Co-variations can neither be captured by purely spatial variations, nor by purely inter-individual variations, but they may correspond to individual-specific spatial effects.

The importance of understanding the above-mentioned co-variations might be analogous to the importance of understanding individual-specific action (activity) space. In fact, recent studies have emphasized that action space should be described for each individual by considering biographical elements in the mental map of each individual, rather than assuming that all individuals have the same perceptions of space (Schönfelder and Axhausen 2003). Differences between social networks would lead to different perceptions of space among individuals. For example, those individuals who have babies might choose supermarkets where they can buy baby-related products. Thus, the degree of attractiveness of spaces would differ across individuals for various reasons, and co-variations might be the terms which could capture such individual-specific spatial effects by taking unobserved effects into account. Although understanding co-variations could provide much deeper insights for policy design, they have not yet been well examined.

On the basis of the above-mentioned considerations, this study proposes a model to detect co-variations as well as other types of variations. The model described in this study is a straightforward extension of the multilevel cross-classified logit (MCL) model (Bhat 2000). The MCL model is similar to the mixed logit (MXL) model with panel data, the difference being that the MCL model can deal with several hierarchical or cross-classified random variables to capture various unobserved variations. In this study, we additionally introduce interaction term(s) related to different pairs of cross-classified levels to capture co-variations. The result is a multilevel cross-classified logit model with co-variations (MCLCV). The advantage of the MCLCV model is that it can detect the influence of interaction(s) between different cross-classified levels before introducing explanatory variables. In other words, this allows us to flexibly decompose behavioral variations (variations of utility) into the variation types of interesting aspects, including co-variations. Such decomposition could be used to figure out where the variations of utility come from. By applying the MCLCV model with explanatory variables, it is possible to identify which explanatory variable contributes to the reduction of which parts of the variations. In other words, it is possible to examine whether the introduced explanatory variables suffice to describe the behavior or not.

After describing the MCLCV model, we shall provide an empirical analysis with respect to travel mode choice, which has been widely addressed in the literature. The total of all behavioral variations will be classified into intra-individual variations, inter-individual variations, spatial variations, and co-variations between inter-individual and spatial variations. In the empirical analysis, spatial variations are defined as variations caused by differences between origin–destination (O–D) pairs and residential locations. Accordingly, the relationship between intra-individual and inter-individual/spatial variations has a hierarchical structure (i.e., a mode choice decision is exactly nested within a certain individual and space), while the relationship between inter-individual and spatial variations has a cross-classified structure (i.e., the same individual could travel between different O–D pairs). We first compare model estimation results with and without the co-variation terms and then introduce a number of explanatory variables to capture the variations. For the empirical analysis we used data from Mobidrive, a continuous six-week travel survey, collected in Karlsruhe and Halle in Germany in 1999 (Axhausen et al. 2002). These data allow us to distinguish between the above-mentioned different types of variations.

This paper is organized as follows: In the next section, we shall describe the MCLCV model and explain how to determine its variation properties. We shall then summarize the data used in the empirical analysis, focusing on mode choice behavior. After that, we shall compare the estimation results of models with and without co-variation terms and evaluate how many variations can be captured by introducing a number of explanatory variables. In the final section, some major findings and avenues of future research will be summarized.

Methodology

This section describes a method for capturing various kinds of variations, including co-variations, by developing a multilevel cross-classified logit model with co-variations (called the MCLCV model). We shall describe the model with regard to both random (unobserved) and non-random (observed) effects of spatial, inter-individual, and intra-individual variations as well as co-variations between spatial and inter-individual variations (see Fig. 1). As for spatial variations, it can be defined in various ways, including home locations, origins, destinations, and/or origin–destination pairs. The choice would depend on what we want to measure. In fact, spatial variations are one of the most traditional variation types that have been widely addressed, because it can directly connect with policy discussions especially in the context of infrastructure construction. For example, it has been recognized that the accessibility of residential locations has certain impacts on mode choice decisions, and it might be better to employ home location as a unit of spatial variation if our purpose is to identify such home-based accessibility impacts. The travel time and travel cost have also been recognized as important influential factors on the mode choice decisions, and, in this case, employing origin–destination pair as a unit of spatial variations would be preferable. On the other hand, rather than spatial factors, a number of researchers have pointed out that individual socio-demographic factors have great impacts on the decisions of mode choice (Kitamura et al. 1997; Susilo and Maat 2007; Pinjari et al. 2007). Such impacts would be reflected in inter-individual variations, including car ownership, household structure, psychological attitude, and so on. Additionally, it can also be expected that the situational attributes such as traveling with friends, time pressure, etc., which can easily change even within an individual, may be important influential factors on mode choice decisions as mentioned in Introduction. Such impacts would be reflected in intra-individual variations. Note that, since it is usually difficult to capture such situational attributes, it could be expected that many of them would remain as unobserved variations. Finally, co-variations can be regarded as variations derived from individual-specific spaces, as mentioned in the previous section. Here, it should be noted that co-variations between intra-individual and spatial variations and co-variations between intra-individual and inter-individual variations do not theoretically exist, since these relations follow hierarchical structure (please refer to the discussions in Introduction). In this study, we attempt to examine the above mentioned four different types of variations.

Fig. 1
figure 1

The variation structure assumed in this study

The MCLCV model

The MCLCV model is a utility-maximization model in the context of discrete choice behavior, based on the multinomial logit (MNL) model. Consider the situation that an individual i (i = 1, 2,…, I) who travels in the space s (s = 1, 2,…, S) on a t-th trip (t = 1, 2,…, T i ) chooses alternative j (j = 1, 2,…, J) based on the following utility U jist (please refer to Table 1 for the concrete example of utility function):

$$ U_{jist} = {{\upbeta}}_{\text{j}} {\text{x}}_{\text{jist}} + \gamma_{ji} + \gamma_{js} + \gamma_{jis} + \gamma_{jist} + \varepsilon_{jist} , $$
(1)

where xjist indicates explanatory variables including level-of-service variables, mobility tool ownership, decision maker’s attributes and situational variables. Let βj be a coefficient vector associated with xjist. Let ε jist represent an unobserved intra-individual variation which is assumed to be independently and identically Gumbel-distributed with a variance of σ 2 π 2/6 (the scale parameter σ is a fixed value). Let γ jist be introduced to capture unobserved intra-individual variations in a manner analogous to the MXL model (Ben-Akiva et al. 2001). The terms γ ji and γ js are also random; these are introduced to capture unobserved inter-individual variations (i.e., variations across individuals) and unobserved spatial variations (i.e., variations across travel space), respectively. Let γ jis be a co-variation term which is introduced to capture the interaction effects between inter-individual and spatial variations (i.e., the unobserved variations caused by the two cross-classified factors jointly). Here we assume that γ ji , γ js , γ jis and γ jist are normally distributed as follows:

$$ {{\upgamma}}_{\text{i}} = \left( {\gamma_{ 2i} ,\ldots ,\gamma_{Ji} } \right) \sim MVN {\left( 0 ,\sum_{i}\right)} ,\;{{\upgamma}}_{\text{s}} = \left( {\gamma_{ 2s} ,\ldots ,\gamma_{Js} } \right) \sim MVN\left( { 0 ,\sum_{s} } \right) ,\;{{\upgamma}}_{\text{is}} = \left( {\gamma_{ 2is} ,\ldots ,\gamma_{Jis} } \right) \sim MVN {\left( 0 ,\sum_{is} \right)} ,\;{{\upgamma}}_{\text{ist}} = \left( {\gamma_{ 2ist} ,\ldots ,\gamma_{Jist} } \right) \sim MVN\left( { 0 ,\sum_{ist} } \right) , $$
(2)

where Σ i , Σ s and Σ is are variance–covariance matrices for each level of variation and Σ ist corresponds to the variance–covariance matrix for intra-individual variation, adding σ 2 π 2/6 in the diagonal elements. Note that only differences in utility matter in discrete choice modeling, and thus one of the utilities has to be fixed as a base (Train 2003). As this needs to be done for all variation types, we fixed γ 1i , γ 1s , γ 1is and γ 1ist as zeros. Additionally, for intra-individual variations, one of the rests of variance needs to be fixed as well to normalize the scale of utility. Thus, the full variance–covariance parameters which can theoretically be estimated are [(J − 1)J/2] − 1 for intra-individual variation Σ ist and (J − 1)J/2 for other higher-level variations Σ i , Σ s and Σ is . Based on the above-mentioned model specifications, the following conditional probability of the choice of mode j can be written in the MNL form

$$ P\left( {y_{jist} = j|\beta , {{\upgamma}}_{\text{i}} , {{\upgamma}}_{\text{s}} , {{\upgamma}}_{\text{is}} , {{\upgamma}}_{\text{ist}} } \right) = {\frac{{\exp \left( {{{\left( {\beta_{\text{j}} {\text{x}}_{\text{jist}} + \gamma_{ji} + \gamma_{js} + \gamma_{jis} + \gamma_{jist} } \right)} \mathord{\left/ {\vphantom {{\left( {\beta_{\text{j}} {\text{x}}_{\text{jist}} + \gamma_{ji} + \gamma_{js} + \gamma_{jis} + \gamma_{jist} } \right)} \sigma }} \right. \kern-\nulldelimiterspace} \sigma }} \right)}}{{\sum\nolimits_{{j^{\prime} = 1}}^{J} {\exp \left( {{{\left( {\beta_{\text{j}} {\text{x}}_{{j^{\prime}{\text{ist}}}} + \gamma_{{j^{\prime}i}} + \gamma_{{j^{\prime}s}} + \gamma_{{j^{\prime}is}} + \gamma_{jist} } \right)} \mathord{\left/ {\vphantom {{\left( {\beta_{\text{j}} {\text{x}}_{{j^{\prime}{\text{ist}}}} + \gamma_{{j^{\prime}i}} + \gamma_{{j^{\prime}s}} + \gamma_{{j^{\prime}is}} + \gamma_{jist} } \right)} \sigma }} \right. \kern-\nulldelimiterspace} \sigma }} \right)} }}} $$
(3)

When we specify the model with the above-mentioned full variance–covariance parameters (with small value for the scale parameter σ), the complete variation properties of all pairs of alternatives can be obtained. However, we found that it is quite difficult in empirical analysis to estimate full variance–covariance parameters (i.e., the estimation results are not stable: the estimation procedure will be mentioned later), especially in the estimation of parameters of intra-individual variations and in the simultaneous estimation of covariance parameters at different levels. This means, the model could formally and theoretically be identified yet it may exhibit very small variation in the objective function from its maximum over a wide range of parameter values, resulting in being empirically non-identifiable (Keane 1992; Grilli and Rampichini 2007). Note that, since this is not theoretical identification problem, the model with full variance–covariance parameters is potentially identifiable: there is a possibility that employing different estimation algorithm and/or different parameterizations of variance–covariance matrix could solve such informal or non-theoretical identification problem. For example, Train (2001) compared theoretical and empirical properties of classical estimation procedure (maximum simulated likelihood estimation with Halton draws) and Bayesian estimation procedure (a hierarchical Bayesian procedure), and concluded that Bayesian approach has theoretical advantage from both classical and Bayesian perspectives and the estimation results are essentially the same. Expanding such comparison analysis (especially focusing on the stability of the estimation) would be helpful to choose a proper estimation method, and it certainly remains as a big issue to be solved; however, in this paper we will only focus on the development of choice model with co-variation effects, and the discussion on estimation methods lies outside the scope of this paper.

Table 1 An example of data structure

To avoid the above mentioned identification problem, in this study γ jist is omitted from the model, and we only estimated variances in (J − 1) alternatives (with the covariances fixed as zero) for spatial variations and co-variations, while the full variance–covariance matrix was given for inter-individual variations. Thus, it is assumed that all sources of unobserved correlation over alternatives are derived from inter-individual matters such as the differences of attitudes and perceptions across individuals. There are several reasons for putting correlation terms into only inter-individual level. First, to assure stable model estimation, as discussed above, we had to eliminate some correlation terms. Second, based on the preliminary analysis which is conducted to identify the magnitude of unobserved variation for each level, we decided to give the highest priority to introducing correlation terms into inter-individual level.

Note that even when we exclude Σ ist from the model, higher-level variations are automatically rescaled according to the fixed scale of intra-individual variations (the unknown parameters β are also rescaled), and the ratio of each variation to the total variation can be safely calculated (Grilli and Rampichini 2003, 2007). However, such rescaling in the utility U jist (j = 2, 3,…, J) might occur only with respect to the utility of the first alternative U 1ist, in which all random components are fixed for normalization. Therefore, the variation properties of all pairs of alternatives could not be obtained, and we could only determine the variation properties of the utility U jist (j = 2, 3,…, J) with respect to the utility of the first alternative, given the restricted variance–covariance matrices.

In particular, with the scale parameter σ set to 1 (which also leads to rescaling the other fixed and random parameters), this study employs the following model:

$$ U_{jist} = {{\upbeta}}_{\text{j}} {\text{x}}_{\text{jist}} + \gamma_{ji} + \gamma_{js} + \gamma_{jis} + \varepsilon_{jist} , $$
(4)
$$ P\left( {y_{jist} = j|\beta , {{\upgamma}}_{\text{i}} , {{\upgamma}}_{\text{s}} , {{\upgamma}}_{\text{is}} } \right) = {\frac{{\exp \left( {\beta_{\text{j}} {\text{x}}_{\text{jist}} + \gamma_{ji} + \gamma_{js} + \gamma_{jis} } \right)}}{{\sum\nolimits_{{j^{\prime} = 1}}^{J} {\exp \left( {\beta_{\text{j}} {\text{x}}_{{j^{\prime}{\text{ist}}}} + \gamma_{{j^{\prime}i}} + \gamma_{{j^{\prime}s}} + \gamma_{{j^{\prime}is}} } \right)} }}}, $$
(5)

where

$$ \begin{gathered} {{\upgamma}}_{\text{i}} = \left( {\gamma_{ 2i} ,\ldots ,\gamma_{Ji} } \right) \sim MVN\left( { 0 ,\sum_{i} } \right) ,\; \hfill \\ \gamma_{js} \sim N\left( { 0 ,\sigma_{js}^{2} } \right){ ,}\;\gamma_{jis} \sim N\left( { 0 ,\sigma_{jis}^{2} } \right) \, j = 2,3, \ldots J \hfill \\ \end{gathered} $$
(6)

Henceforth, the diagonal elements (i.e., variances) and non-diagonal elements (i.e., covariances) in Σ i are described as \( \sigma_{ji}^{2} \) and \( \sigma_{{jj^{\prime}i}} \), respectively (j = 2, 3,.., J). As mentioned above, all random components in the first alternative are set to zero. All of the following descriptions are based on this model, with limited variance–covariance matrices shown in Eqs. 5 and 6.

Note that if the \( \sigma_{jis}^{2} \) is close to zero, the MCLCV model becomes an MCL model (Bhat 2000), and if the \( \sigma_{jis}^{2} \) and \( \sigma_{js}^{2} \) are equal to zero, the MCLCV model becomes an MXL model with panel data. Thus, the MCLCV model is a straightforward extension of such mixed-logit-type models. Although the MCLCV model itself is just a simple extension of a mixed logit model, it can capture various kinds of co-variation effects. We could quantitatively identify not only individual-specific spatial effects, but also social interaction effects, for example by considering co-variations as interaction terms across different reference groups’ effects on the behavior of a target person. The proposed method could be a powerful tool to check the existence of co-variation effects in various contexts.

Variation properties of utility difference

Here we shall mention a way to describe behavioral variations in the above-mentioned model, which we used in the empirical study mentioned in next section. Conventional ways have often focused only on observed variations which can be directly connected to policy discussions. This study follows a somewhat different approach. Concretely speaking, all behavioral variations are first treated as unobserved variations in order to determine what kinds of variations really exist (i.e., βjxjist in Eq. 4 is excluded). In other words, we first decompose all variations into four different types of unobserved variations, including unobserved inter-individual variations, unobserved spatial variations, unobserved intra-individual variations and unobserved co-variations between inter-individual and spatial variations. Using the symbol “~” to represent model estimation results without any explanatory variables (called the Null model), the total variance of utility difference can be calculated as follows:

$$ {\text{Var}}\left( {\tilde{U}_{jist} - \tilde{U}_{1ist} } \right) = \tilde{\sigma }_{ji}^{2} + \tilde{\sigma }_{js}^{2} + \tilde{\sigma }_{jis}^{2} + \pi^{2} /3 $$
(7)

This variation decomposition can provide several useful insights. For example, when substantial amounts of intra-individual variations are observed, people’s mode choice behavior would strongly depend on the situational or context-dependent attributes given at the time. Similarly, if almost all behavioral variations are derived from spatial variations, this means that there are no individual or situational differences and spatial aggregate data can be used for the analysis. Also, the more inter-individual variations are observed, the greater the possible impacts of individual attributes (such as age, gender, car ownership and season ticket ownership). Such observations could help us, for example, to identify which levels of influential factors are likely to be more important. Additionally, this variation decomposition allows us to detect pure co-variation effects (i.e., individual-specific spatial effects) before introducing explanatory variables.

In the next step, we shall introduce a set of explanatory variables to provide reasons for the behavioral variations measured in the Null model. Using the symbol “^” to represent model estimation results with a set of explanatory variables (called Full model), the total variance of utility difference can be calculated as follows:

$$ {\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right) = {\text{Var}}\left( {\hat{\beta }_{\text{j}} {\text{x}}_{\text{jist}} - \hat{\beta }_{ 1} {\text{x}}_{{ 1 {\text{ist}}}} } \right) + \hat{\sigma }_{ji}^{2} + \hat{\sigma }_{js}^{2} + \hat{\sigma }_{jis}^{2} + \pi^{2} /3 $$
(8)

Introducing explanatory variables could change some behavioral variations into observed variations while the rest remain unobserved variations. Our purpose here is to evaluate what types and how many of the variations can be captured by introducing explanatory variables. To do this, we compare the variation components in Eq. 8 against those in Eq. 7. Here, although the absolute expected value of \( {\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right) \) may change depending on how many intra-individual variations can be captured by introducing explanatory variables, the component ratio for each variation can be compared between the different models as long as the existence of the same “true” utility can be expected. This is because the scale of \( {\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right) \) is strictly defined by the rest of the unobserved intra-individual variations, and also because the other fixed and random parameters are automatically rescaled. Thus, we can compare the component ratio for each variation between Eq. 7 and Eq. 8. For example, in case of unobserved inter-individual variation, the component ratio in the Null model can be calculated as \({\tilde{\sigma }_{ji}^{2}}\mathord{\left/{\vphantom {\tilde{\sigma}_{ji}^{2}} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} }\right)}} \right.}\) and that in the Full model can be calculated as \( {{\hat{\sigma }_{ji}^{2} } \mathord{\left/ {\vphantom {{\hat{\sigma }_{ji}^{2} } {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}} \). Since a part of variation in the Null model is expected to be explained by introduced explanatory variables in the Full model, the component ratio in the Full model would become smaller than that in the Null model, and the difference between them would be the observed variation. Such calculations can show which types and how many of the variations can or cannot be captured by introducing a certain set of explanatory variables, as follows:

For observed inter-individual variations (%):

$$ \left( {{{\tilde{\sigma }_{ji}^{2} } \mathord{\left/ {\vphantom {{\tilde{\sigma }_{ji}^{2} } {{\text{Var}}\left( {\tilde{U}_{jist} - \tilde{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\tilde{U}_{jist} - \tilde{U}_{1ist} } \right)}} - {{\hat{\sigma }_{ji}^{2} } \mathord{\left/ {\vphantom {{\hat{\sigma }_{ji}^{2} } {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right) \times 100, $$
(9)

For unobserved (or remaining) inter-individual variations (%):

$$ \left( {{{\hat{\sigma }_{ji}^{2} } \mathord{\left/ {\vphantom {{\hat{\sigma }_{ji}^{2} } {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right) \times 100, $$
(10)

For observed spatial variations (%):

$$ \left( {{{\tilde{\sigma }_{js}^{2} } \mathord{\left/ {\vphantom {{\tilde{\sigma }_{js}^{2} } {{\text{Var}}\left( {\tilde{U}_{jist} - \tilde{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\tilde{U}_{jist} - \tilde{U}_{1ist} } \right)}} - {{\hat{\sigma }_{js}^{2} } \mathord{\left/ {\vphantom {{\hat{\sigma }_{js}^{2} } {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right) \times 100, $$
(11)

For unobserved (or remaining) spatial variations (%):

$$ \left( {{{\hat{\sigma }_{js}^{2} } \mathord{\left/ {\vphantom {{\hat{\sigma }_{js}^{2} } {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right) \times 100, $$
(12)

For observed co-variations (%):

$$ \left( {{{\tilde{\sigma }_{jis}^{2} } \mathord{\left/ {\vphantom {{\tilde{\sigma }_{jis}^{2} } {{\text{Var}}\left( {\tilde{U}_{jist} - \tilde{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\tilde{U}_{jist} - \tilde{U}_{1ist} } \right)}} - {{\hat{\sigma }_{jis}^{2} } \mathord{\left/ {\vphantom {{\hat{\sigma }_{jis}^{2} } {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right) \times 100, $$
(13)

For unobserved (or remaining) co-variations (%):

$$ \left( {{{\hat{\sigma }_{jis}^{2} } \mathord{\left/ {\vphantom {{\hat{\sigma }_{jis}^{2} } {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right) \times 100, $$
(14)

For observed intra-individual variations (%):

$$ \left( {{\kern 1pt} {{3^{ - 1} \pi^{2} } \mathord{\left/ {\vphantom {{3^{ - 1} \pi^{2} } {{\text{Var}}\left( {\tilde{U}_{jist} - \tilde{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\tilde{U}_{jist} - \tilde{U}_{1ist} } \right)}} - {\kern 1pt} {{3^{ - 1} \pi^{2} } \mathord{\left/ {\vphantom {{3^{ - 1} \pi^{2} } {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right) \times 100, $$
(15)

For unobserved (or remaining) intra-individual variations (%):

$$ \left( {{\kern 1pt} {\kern 1pt} {{3^{ - 1} \pi^{2} } \mathord{\left/ {\vphantom {{3^{ - 1} \pi^{2} } {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right. \kern-\nulldelimiterspace} {{\text{Var}}\left( {\hat{U}_{jist} - \hat{U}_{1ist} } \right)}}} \right) \times 100 $$
(16)

The variation properties derived from Eqs. 7 and 8 and subsequently from Eqs. 916 could provide many implications for both data collection and model development. For example, if unobserved intra-individual variations are quite high even after introducing available explanatory variables, this indicates that we might need many more situational attributes to provide richer explanations for behavioral variations. In other words, Eqs. 916 can be used to evaluate the model’s performance in greater detail by evaluating the model for each alternative and for each variation type. In the empirical analysis, we introduced four sets of explanatory variables in a sequential manner. This sequential procedure allowed us to evaluate the impacts of each set of explanatory variables. In other words, we could confirm which types of variations could actually be captured by which subset of explanatory variables. Note that although the order in which the variable subsets are introduced would not influence the variation properties if all the variables in the model are independent of each other (and this assumption was already made when the model was estimated), this assumption usually cannot be satisfied completely. For example, the dummy variable “male” (a decision maker’s attribute) might be correlated with the dummy variable “trip purpose (work or not)” (a situational attribute), and as a result, the order in which the variable subsets are introduced may somewhat influence the variation properties. In this study, we decided on the order of introducing variable subsets by considering the importance of each as well as the ease of data acquisition. In the example above, we would give priority to the variable “male” over the variable “trip purpose”, because the former is much easier to observe. Furthermore, if we can capture variations by “male” rather than by “trip purpose”, we could easily extend the application of the model to the entire population of the target area. Thus, after capturing variations by “male”, the variable “trip purpose” just serves to capture the remaining variations which could not be captured by the explanatory variables already introduced. However, how the implications differ according to the order of introducing explanatory variable sets still remains as an important future task to be examined.

Estimation procedure

The likelihood function based on Eq. 4 can be written as follows:

$$ L_{jist} \left( {\beta ,\sum_{i} ,\sigma_{\text{s}} ,\sigma_{\text{is}} |{\text{y,x}}} \right) = \int\limits_{{\gamma_{\text{is}} }} {\int\limits_{{\gamma_{\text{s}} }} {\int\limits_{{\gamma_{\text{i}} }} {\prod\limits_{{N_{ist} }} {\prod\limits_{j} {P\left( {y_{jist} = j|\beta ,\gamma_{\text{i}} ,\gamma_{\text{s}} ,\gamma_{\text{is}} } \right)^{{\delta_{j} }} f\left( {\gamma_{\text{i}} |\sum_{i} } \right)f\left( {\gamma_{\text{s}} |\sigma_{\text{s}} } \right)f\left( {\gamma_{\text{is}} |\sigma_{\text{is}} } \right){\text{d}}\gamma_{\text{is}} {\text{d}}\gamma_{\text{s}} {\text{d}}\gamma_{\text{i}} ,} } } } } $$
(17)

where N ist is the number of samples. The dummy variable δ j takes the value 1 if individual i chooses mode j on the t-th trip, and otherwise it takes the value 0. The term fi i ) follows a multivariate normal distribution with a mean of 0 and a variance–covariance of Σ i . The terms fss) and fisis) also follow multivariate normal distributions with a mean of 0 and variances of σ js and σ jis (j = 2, 3,…, J), respectively, but these covariances are fixed as 0 (see Eq. 5).

The likelihood function of Eq. 17 involves a number of integrations and can not be solved analytically. To estimate such a model, some simulation methods are usually adopted, such as a series of Monte Carlo methods or numerical quadrature methods (e.g., Pinheiro and Bates 1995; Bhat 2001; Train 2003). In this study, we used a hierarchical Bayesian procedure based on the Markov chain Monte Carlo (MCMC) method, which has recently become popular and is a promising method for estimating multilevel models with complicated random effects (Train 2003; Gelman et al. 2004). The method incorporates prior distribution assumptions and, based upon successive sampling from the posterior distribution of the model’s parameters, yields a chain which is then used for making point and interval estimations. In particular, the posterior distribution can be written as follows:

$$ \pi \left( {\beta ,\sum_{i} ,\sigma_{\text{s}} ,\sigma_{\text{is}} |{\text{y,x}}} \right) = \prod\limits_{{N_{ist} }} {\prod\limits_{j} {P\left( {y_{jist} = j|\beta , {{\upgamma}}_{\text{i}} , {{\upgamma}}_{\text{s}} , {{\upgamma}}_{\text{is}} } \right)^{{\delta_{j} }} f\left( {\gamma_{\text{i}} |\sum_{i} } \right)f\left( {\gamma_{s} |\sigma_{s} } \right)f\left( {\gamma_{is} |\sigma_{is} } \right)\phi \left( {\sum_{i} } \right)\phi \left( {\sigma_{\text{s}} } \right)\phi \left( {\sigma_{\text{is}} } \right)\phi \left( \beta \right)} } $$
(18)

where we assume an inverted Gamma distribution for φs) and φis) (for each element), an inverted Wishart distribution for φ i ) and a normal distribution for φ(β) (for each element) as prior distributions. These two prior distributions correspond to conjugate prior distributions, which mean that the posterior distributions can be specified within the same distributional families as the prior distributions. The terms fi i ), fss), and fisis) create a so-called hierarchical sampling procedure. In this study, non-informative prior distributions were assumed for all parameters. In this case, the estimated parameters are asymptotically equivalent to the parameters obtained through a simulated maximum likelihood method (Train 2003). Note that, even when we employ the different prior distributions that are not conjugate prior distributions, we may also obtain asymptotically equivalent parameters estimated with conjugate prior distributions, although a conjugate prior distribution is much more algebraic convenience, giving a closed-form expression for the posterior.

Draws from the posterior are obtained using the software WinBUGS (Bayesian inference Using Gibbs Sampling [Lunn et al. 2000]). In the MCMC sampling, draws of each parameter are obtained from its posterior conditional on the other parameters (Gelman et al. 2004). The conditional posterior distribution for each parameter can be found in Train (2003), for example. The stabilities of the estimation results reported in this paper were checked a number of ways, including: checking the trace plot and autocorrelation in each parameter chain; using the Geweke diagnostic (Geweke 1992); and checking the results with different initial values of parameters. All of these indicated that the models reported in this paper are well converged.

Data

The Mobidrive data set (Axhausen et al. 2002) was used in the empirical analysis. The data are from a continuous six-week travel diary survey that was conducted in Karlsruhe and Halle in Germany in 1999. A total of 317 persons from 139 households took part in the main survey.

The mode choice set used in this empirical analysis includes the following options: “Car (passenger)”, “Bicycle”, “Car (driver)” and “Public transport”. After data cleaning with respect to the availability of level-of-service values (only travel time in this study) for both used and non-used alternatives, a total of 18,326 trips made by 309 individuals were selected. The mode choice shares are: “Car (passenger)” (15.6%), “Bicycle” (20.7%), “Car (driver)” (39.6%) and “Public transport” (24.1%).

The spatial variation was defined based on differences in the Origin–Destination (O–D) distributions and the residential locations. Note that the origins, destinations and residential locations are classified into three zones: “CBD (Central Business District)”, “inner city” and “others” for each city. The maximum number of combinations is 54 (= 3[origins] × 3[destinations] × 3[residential locations] × 2[cities]). In the samples used in this study, 53 spaces were observed. Thus, the 18,326 samples are nested within 1,635 individual-specific spaces (i.e., the unit for co-variations), and the 1,635 individual-specific spaces are nested within 53 spaces (i.e., the unit for spatial variations) and within 309 individuals (i.e., the unit for inter-individual variations). A simple example of data structure (in case of two individuals and two O–D pairs) is shown in Table 1. In this example, it can be confirmed that individual 1 tends to choose “Bicycle” compared to individual 2. This difference may generate inter-individual variations. As for the spatial differences, “Car (driver)” seems to be a preferable alternative when they travel in O–D pair 2. This difference may generate spatial variations. Additionally, it can be confirmed that co-variations (i.e., individual-specific spatial effects) could be also generated, since individual 1 tends to use “Bicycle” for traveling in O–D pair 1 while individual 2 tends to use “Public transport” even for traveling in the same O–D pair. The purpose of introducing co-variation term is to capture such variations.

Here it should be noted that, in general, the more complicated variation structures are employed, the more samples are required. This is similar with the case that a model having more explanatory variables requires more samples. For this reason, we employ relatively rough location segmentation as mentioned above. Thus, one should keep in mind that the spatial variations identified here would be quite rough and may not reflect small spatial differences. Also, to distinguish between inter-individual and intra-individual variations, we need a multi-day travel diary survey data.

Model estimation and discussion

In this section, we shall first report the estimation results of the MCLCV model and the MCL model to determine whether co-variation effects really exist or not. The estimations were done without any explanatory variables to detect the pure influence of co-variations. Explanatory variables were then introduced in a sequential manner. We first introduced only Travel time variables. We then sequentially added the variables Mobility tools, Decision-maker’s attributes and Situational attributes.

Table 2 shows the definitions of the introduced explanatory variables. With regard to Travel time, network level-of-service (LOS) values were used for all modes. Theoretically, it can be expected that Travel time would mainly capture spatial variations. Public transport travel time included transfer waiting time. Travel cost might be another important variable in mode choice, but we could not introduce it because the data were unavailable. In addition, attention needed to be paid to the type of LOS variables used to capture behavioral variations. For example, one would expect LOS variables derived from network data to mainly capture spatial variations, while LOS variables based on responses from respondents would partly represent inter-individual variations because of the reflection of the respondents’ perception of travel times and costs. Disparities between different types of LOS variables as well as the effects of other LOS variables such as travel time and the number of transfers would certainly be worth investigating in future studies.

Table 2 Explanatory variables used for model estimations

Mobility tools variables included: license-holder dummy variables, the number of household members who were the main car users, and the interaction term of these two variables for the option “Car (passenger)”; the number of bicycles in a household divided by the number of household members for the option “Bicycle”; a dummy variable of the main car users (obtained from the vehicle question “Who is the main user of this vehicle?”) for the option “Car (driver)”; and the number of season tickets for the option “Public transport”. Decision maker’s attributes included: household income and age (as categorial variables), gender, marital status with child(ren), and employment status. Mobility tools and decision maker’s attributes would mainly capture the inter-individual variations. The following Situational attributes were selected: trip purpose, day of the week, departure time and size of the travel party. Intuitively, these variables may capture intra-individual variations. The above mentioned explanatory variables were introduced in certain alternatives based on a preliminary cross-tabulation analysis.

Note that the introduced explanatory variables are of course not perfect, and omitted variables (including travel cost) should be reflected in the unobserved variations. Although explaining all variations (i.e., there is no unobserved variation) might be impossible, the remaining unobserved variation for each level could be used for a guidepost for the next data collection.

Model comparison with and without co-variations

As mentioned above, only differences in utility matter in discrete choice model, and we thus had to fix one of the utilities as a base. For this case study, we selected “Car (passenger)” as the base alternative.

The estimation results of the model without co-variations (the MCL model) and the model with co-variations (the MCLCV model) are shown in Table 3. The estimations were first done without any explanatory variables (Null models) to detect the pure influence of co-variation terms. The goodness-of-fit measures of DIC (deviance information criteria) indicate that co-variation effects certainly exist at a significant level, i.e., the spatial effects on mode choice behavior differed from individual to individual. The component ratio of each variation type is shown in Fig. 2. The co-variation shares in the Null model are 9.2% in the category “Bicycle”, 1.9% in “Car (driver)” and 12.7% in “Public transport”. Thus, while the utility in “Car (driver)” was neither much affected by individual-specific spatial effects nor by common spatial effects (i.e., the spatial variation was quite small at only 0.4%), individual-specific spatial effects might have to be considered in the utility specifications for the categories “Bicycle” and “Public transport”. Additionally, even after introducing co-variation terms into the model, the spatial variation in the “Public transport” option remained significant (9.0%). These results indicate that there are certainly two different types of spatial effects: spatial effects which are common to all individuals (spatial variations) and those which are different across individuals (co-variations). The quantitative discrimination between these two spatial effects is quite important in terms of data collection and model development, because when we developed behavioral models with cross-sectional data, the model could not distinguish between them. This means, for example, that if all spatial effects are individual-specific, the model just describes differences in perceptions or meanings of space across individuals rather than a certain spatial attribute itself, implying that there is a possibility that the change of spatial attributes has little impact on behavior. To distinguish between these two spatial effects, a multi-day set of survey data is needed, analogous to distinguishing between inter-individual and intra-individual variations (Hanson and Huff 1986).

Table 3 Estimation results of Null models with/without co-variations
Fig. 2
figure 2

Proportions of variations in Null models with/without co-variations

With regard to other variation types, it was found that inter-individual variations dominated, especially in the category “Car (driver)” (88.5%). Also, in the categories “Bicycle” and “Public transport”, the shares reached 71.9 and 58.3% respectively, and the intra-individual variations were only 9.2–20.0%. These results seem to be quite significant compared to the proportions of variations in other behavioral aspects. Concretely speaking, previous studies have shown higher shares of intra-individual variations in many cases: around 50–60% for the number of trips per day, travel time and travel distance (Pas 1987; Pendyala 1999); 35–85% (depending on the activity type) for departure time choice (Kitamura et al. 2006; Chikaraishi et al. 2009); and 17–65% for time use behavior (Goulias 2002; Chikaraishi et al. 2010). Given these results, we could say that mode choice behavior is relatively stable from day to day compared to other behavioral aspects. This might be quite important for data collection: when we are only interested in mode choice behavior, behavioral observations on shorter days might be acceptable compared to other behavioral aspects, although it is of course better to observe behavior across a longer period, since there are certain intra-individual variations. In addition, because the significant variations in travel choice behavior are inter-individual variations, we may have to pay much more attention to individual-specific influential factors such as individual attributes, mobility tool ownership and habitual preferences.

Estimation results with explanatory variables

Next, we sequentially introduced four sets of explanatory variables (Travel time, Mobility tools, Decision maker’s attributes and Situational attributes: see Table 2) into the Null model with the co-variations discussed in the previous subsection. We estimated four more models and compared their variation properties. Here we shall first look at the estimation results with the full set of explanatory variables (called the Full model) in order to check the impacts of the introduced explanatory variables. We shall then focus on sequential changes in the variation properties following the addition of a certain set of explanatory variables.

Table 4 shows the estimation results of the Full model. As expected, the parameter Travel time has a negative sign and is statistically significant. In the category Mobility tools, non-holders of licenses with other household members who were the main car users tended to choose the option “Car (passenger)”, while the main car users were likely to choose “Car (driver)”. The variables of bicycle and season ticket ownership also had positive impacts on the use of “Bicycles” and “Public transport”. In the category Decision maker’s attributes, married males who had at least one child were likely to choose the option “Car (driver)”, while young people (under 24 years of age) and people with high incomes tended to use “Bicycles” and “Public transport”, compared to Car (passengers). At the same time, interestingly, age effects on the options “Bicycle” and “Public transport” seem to follow a quadratic function, i.e., the young and elderly people tended to choose “Bicycle” and “Public transport”. With regard to the effects of trip purpose on mode choice in the category Situational attributes, pick up and drop off behavior of course had a positive impact on the “Car (driver)” share, as did work-related trips. People were not likely to use “Public transport” for daily shopping trips. Also, on weekends they used less “Public transport” and “Bicycles”, while the share of “Car (passengers)” tended to increase. The estimation results of the departure time variables indicate that the share of “Car (passengers)” tended to increase from morning to late evening, while “Public transport” was likely to decrease: there is a possibility that “Car (passengers)” and “Public transport” substitute for each other, depending on the time of day. It can be expected that non-mandatory activities, such as going to restaurant or shopping, might be dominant purposes of night trips, while mandatory activities might be dominant purposes of morning trips. Since non-mandatory activities are often done with families or friends, they could easily be car passengers of their friends or family members. In fact, the results also indicate that increases in the size of travel parties cause the increases of the share of “Car (passengers)”. Overall, the impacts of the introduced explanatory variables are quite consistent with previous studies and are intuitively plausible.

Table 4 Estimation results of the Full model

Next, we shall consider the sequential changes of variation properties along with the addition of a certain set of explanatory variables. Figures 3, 4, and 5 show the results of the sequential changes of proportions of four unobserved variations (spatial variations, inter-individual variations, intra-individual variations and co-variations) as well as observed variations which were captured by introduced explanatory variables. As expected, the unobserved variations basically decreased when a set of explanatory variables was added.

Fig. 3
figure 3

Proportions of variations in “Bicycle”

Fig. 4
figure 4

Proportions of variations in “Car (driver)”

Fig. 5
figure 5

Proportions of variations in “Public transport”

First, by introducing the variable Travel time, 0.0–1.8% of the total variations could be classified as observed variations. For the option “Car (driver)”, the observed variation was 0.0%, because the same travel times are assumed for both “Car (passengers)” and “Car (drivers)”. Amazingly, the contribution of Travel time to the reduction of unobserved variations was rather small. It might be worth investigating the extent to which detailed level-of-service variables which were not introduced in this study (such as travel costs and the number of transfers) could reduce unobserved variations. This may be important for establishing a model to use for future predictions and policy evaluations: we may at least have to confirm whether a model with a general set of level-of-service variables would suffice for making predictions.

Second, we added Mobility tools explanatory variables to the model. The proportions of observed variations to the total amount of variations become 18.7% in the option “Bicycle”, 45.5% in “Car (driver)” and 15.2% in “Public transport”. Compared to the Travel time variable, the Mobility tools variables captured behavioral variations well, especially in the category “Car (driver)”. From a practical perspective, the results seem to suggest that policies related to Mobility tools change people’s mode choice behavior much more effectively than improving Travel time. In this sense, a model of mode choice that simultaneously incorporates bicycle, car and season ticket ownership might be one means of establishing comprehensive policies for reducing car use. As for which types of unobserved variations were reduced by introducing Mobility tools variables, they mainly captured inter-individual variations, as expected. Other types of variations also decreased for the options “Bicycle” and “Public transport”, while they slightly increased for the option “Car (driver)”. There are many possible reasons for such observations: first, when the spatial distributions of mobility tool ownership are non-uniform, Mobility tools can capture spatial variations. Second, when the definition of space is too rough to capture non-uniform spatial distributions of mobility tool ownership, one would expect co-variations and intra-individual variations to be somewhat captured by Mobility tools. These points require further exploration in future research.

Third, we added explanatory variables related to Decision maker’s attributes to the model. The observed variations increased further from 18.7 to 25.4% (+6.7%) for the option “Bicycle”, from 45.5 to 64.8% (+19.3%) for the option “Car (driver)” and from 15.2 to 21.1% (+5.9%) for the option “Public transport”. The results indicate that we should take decision maker’s attributes into account in addition to variables related to Mobility tools, especially in the utility specification for “Car (driver)”. As expected, the variables introduced to Decision maker’s attributes mainly captured inter-individual variations (some other types of variations were also captured, probably due to the possible reasons mentioned in the previous paragraph).

Finally, Situational attributes were added to the model. The observed variations increased further from 25.4 to 39.8% (+14.4%) for the option “Bicycle”, from 64.8 to 70.6% (+5.8%) for the option “Car (driver)” and from 21.2 to 26.6% (+5.4%) for the option “Public transport”. The impacts of Situational attributes were relatively higher on “Bicycle” than on the other modes. Note that although lower-level variations, i.e., intra-individual variations and co-variations, were mainly captured by Situational attributes, as expected, Situational attributes also explain a certain amount of inter-individual variations. This might be because the distributions of trip generation with respect to departure time, day of the week, trip purpose and the size of the party differed from individual to individual; as a result, Situational attributes tended to capture inter-individual variations.

In total, the behavioral variations in the option “Car (driver)” were quite well captured by the introduced explanatory variables, especially by Mobility tools, while those in the option “Bicycle” were much more dependent on Situational attributes. The behavioral variations in the “Bicycle” and “Public transport” options were also partly derived from Mobility tools, but unobserved variations still remained at a high level (60.2% for “Bicycle” and 73.4% for “Public transport”). Because most of the remaining unobserved variations were inter-individual variations, there is a possibility that the choices of “Bicycle” and “Public transport” were largely dependent on the participants’ habits (inertia) or on their positive preferences for the use of these modes, which might be difficult to measure. If so, the question of “how bicycle and public transport users create their mode choice habits” might become crucial for further promoting these environmentally friendly modes, in contrast to the existing important question for reducing car use, i.e., “how habits are broken, that is, how choices become deliberate and rational again” (Gärling and Axhausen 2003). One may also assume that individual-specific spatial effects (i.e., co-variation effects) might be relevant to the process of creating habits, because co-variations might reflect differences in people’s perceptions of a certain space. However, we could not explore this question in depth; it remains a challenging task for future research.

Conclusions

In focusing on the co-variations caused by two or more factors jointly, this study first developed a multilevel cross-classified logit model with co-variations (MCLCV) and then applied the model to travel mode choice behavior. In our empirical analysis, we used Mobidrive data (from a continuous six-week travel survey) to decompose the total variations in mode choice behavior into: spatial variations, inter-individual variations, intra-individual variations and co-variations between inter-individual and spatial variations. Co-variations here represent individual-specific spatial effects.

The proposed MCLCV model is a simple extension of the multilevel cross-classified logit (MCL) model which can detect pure co-variation effects that are interaction effects across different cross-classified levels. Quantitative evaluation of these co-variation effects can provide information on whether the model can accept independent assumptions among random terms at different cross-classified levels. It can also lead to a better understanding of travel behavior by providing useful information for data collection and model development. In the case of co-variation effects defined as interaction effects between inter-individual and spatial variations, we can determine whether individual-specific spatial effects exist or not. This could help us to reconsider how to capture the effects of “space” and to reassess the traditional assumption that all individuals have the same responses to and/or perceptions of space. In another context, intra-individual variations can actually be regarded as co-variations between inter-individual variations and temporal variations, reflecting the fact that responses and/or perceptions may vary considerably over time, even for the same individual. In this sense, we could say that this study has tried to extend the ideas of longstanding research on intra-individual variations to other types of variations, and the proposed method can be used to examine various co-variation effects in various contexts.

Our empirical analysis, which focused on mode choice behavior, has shown that co-variation effects, which are defined as interaction effects between inter-individual and spatial variations, certainly do exist (except for the utility of “Car (driver)”). At the same time, our results also show that spatial variations, which are homogeneous spatial effects, play an important role in choice making. These results indicate that there are both homogeneous and heterogeneous spatial effects across individuals and that these two spatial effects can not be distinguished by the traditional approach (i.e., by using cross-sectional data to analyse spatial effects). This is similar to discriminating between inter-individual variations (i.e., homogeneous individual effects over time) and intra-individual variations (i.e., heterogeneous individual effects over time). With regard to other variation types, our results showed large inter-individual variations compared to other variation types, while substantial intra-individual variations have been reported (e.g., Pas 1987; Pendyala 1999; Goulias 2002; Kitamura et al. 2006) in connection with other behavioral aspects. This result might be important in terms of data collection. In particular, since the information necessary for mode-choice models is mainly related to inter-individual variations, applying cross-sectional data to mode-choice modeling may be relatively acceptable compared to modeling other behavioral aspects in which higher intra-individual variations are observed.

Introducing sets of explanatory variables also led to several interesting findings. First, we found that Mobility tools have substantial impacts on mode choice, especially on the option “Car (driver)”. According to our results, a mode choice model that simultaneously incorporates bicycle, car and season ticket ownership might be a means of establishing comprehensive policies for the reduction of car use. Although the behavioral variations for the options “Bicycle” and “Public transport” were also partly derived from Mobility tools, the remaining unobserved variations were still high. One possible reason is that the choices of “Bicycle” and “Public transport” were largely dependent on the survey participants’ habits (inertia). If so, the process of creating habitual behavior might have to be further explored to promote environmentally friendly modes. It would also be interesting to reveal the role of such individual-specific spatial effects in the process of habit formation.

Although this paper has shown the usefulness of examining co-variation effects and exploring the variation properties of discrete travel choice behavior, a number of unsolved issues and topics remain to be examined further in future research. First, this study has not specified the MCLCV model with full variance–covariance matrices for all variation types, and this kept us from revealing the variation properties of all pairs of alternatives. To obtain stable estimation results with full variance–covariance matrices, we might have to reconsider the estimation procedure. Second, it is quite interesting from a practical perspective how many variations can be captured by introducing detailed level-of-service variables. Third, applying different settings of variations and co-variations structures and comparing the differences among them would be an important future work to check the reliability of the implications obtained from a single model with a certain setting of variations/co-variations. Finally, finding out how to measure the process of habit formation, not only for car users, but also for bicycle and public transport users, would be a very challenging task.