Keywords

1 Introduction

For the evaluation of the effects of changes to the mobility system (such as introducing new mobility options, extensions of public transport, the construction of new roads) agent-based mobility simulation (ABMS) models can be used. Contrary to classical prediction models, ABMS models typically allow the assessment of detailed spatial and temporal impacts. In order to do so the mobility behavior of people within a region needs to be represented within the model.

There are several possible choices for modeling: Trip-based models [10] model single trips of all persons in the region. These models are appropriate for route choice and assignment modeling as they are geared to implement mode and route choices but neglect the choice of the timing of the trips. Also, trip-chains are not included in the model. Therefore, the effects of choices that result in changes of activities are not mapped in these models.

Activity-based models (for a recent survey see [2]) view mobility as a demand derived from the execution of activities. These models allow for the incorporation of all trips within a day including decisions on how many trips to take as well as the destinations for every single trip. They hence are appropriate for studies investigating the dependence of the mobility demand and its distribution over time on the characteristics of the mobility infrastructure.

Activity-based models typically are realized using ABMS models. Thereby every person in the analysis region is represented by an agent. Decisions are triggered by individual utility optimization. A prototypical example of such ABMS model is MATSim [4]. MATSim is capable of representing the full mobility system of a whole region or country including all available modes of transport. Simulation of the agent’s mobility is performed for a full day. Hereby, the mobility behavior of the agents is guided by a utility function trading off the utility of executing activities at the various locations during the day with the disutility of traveling between them.

Like all ABMSs MATSim also requires a huge amount of input data. Besides information on the mobility system, a synthetic population needs to be defined that describes the population in the considered region including all characteristics relevant to the represented decision behavior. The synthetic population needs to be representative for the population modeled in terms of the distribution of key parameters called control variables (such as age, occupation, and sex, for example) on the one hand and second also in terms of their mobility demand.

For generating such a synthetic population typically many different data sources are combined [6]. Most frequently aggregate information on the distribution of some control variables within each zone inside the investigated region is used jointly with individual-level information mapping mobility behavior, for example, obtained from mobility or activity surveys.

In most population synthesizers [8], the process of combining the information from these two data sources is performed using iterative proportional fitting (IPF, [3]) where the goal is to infer the relations between variables from the individual level data while the aggregate data is used to obtain the marginal distributions.

For some control variables, the IPF method is not ideally suited. For mode choice, for example, the tradeoffs between costs and time are essential. If mode choice is modeled exogenously then matching aggregated mode choice frequencies (that is adapting the synthetic population such that for the synthetic population the corresponding mode choice frequencies equal the observed mode choice frequencies) could be achieved using IPF. This would, however, distort the relative preferences in the form of value of time.

In this paper for this case an alternative to the IPF procedure is proposed: In the marketing literature [7] the similar problem of combining individual-level information from stated preference questionnaires with aggregate information from measured market shares is solved by adjusting alternative-specific constants (ASCs) in the model for individual-level data to match aggregated market shares. This approach has also been used in the transportation literature in [14], compare also the model transfer theory surveyed in [10, Sect. 9.5.4, pp. 344–345].

This leaves the relative importance of regressor variables in the choice model unchanged. The thus obtained models include the reaction of individuals with respect to changes in characteristics as inferred from individual-level data while achieving marginal distributions as provided by the aggregate data. Clearly, this solution is different from IPF in general.

Second, the synthetic population also needs to be representative in terms of mobility demand. In this respect, the traditional approach is to draw mobility demand for the synthetic population by sampling from the observed persons. In this paper, the usage of mobility motifs is advocated. Motifs [12] summarize the daily mobility of a person in a directed graph. In [12, 13] it has been shown that the distribution of motif choices is remarkably constant in different regions and also over time.

In this paper, we suggest to use motif choice as a central component of a parametric model for mobility demand. This will be discussed in close connection to the data structure used in MATSim. It is argued that based on motif choice the remaining components contained in the data characterizing the agents in MATSim can be obtained from discrete choice models with relatively small choice sets. These models can be fitted using standard methods on individual-level data.

Moreover, it is demonstrated using a large German data set that motif choice is stable over time within the past 20 years. Furthermore, modeling motif choice as a function of several socio-demographic variables shows that the socio-demographics only explain a small fraction of the variation in motif choice. Moreover, also the model coefficients are remarkably stable over time. Taken jointly this indicates that motif choice might prove to be a good choice for anchoring population synthesis methods for ABMS models.

The structure of the paper is as follows: In the next section, the concept of mobility motifs is discussed for the data set used in this paper. Section 3 describes the typical data structure of ABMS models. Subsequently, the framework for population synthesis is discussed. Section 5 then provides a model for motif choice. Finally Sect. 6 concludes the paper.

2 Mobility Motifs

Human mobility motifs summarizing the daily mobility patterns of people have been developed for new data sources like mobile phone billing data or GPS data [12]. These data sources usually do not provide any information other than the position (with differing accuracy defined by the measurement technology) of an individual at a given time. The trajectories of raw position measurements can be processed to identify locations that the individual visited and trips between these locations, see for example the algorithm by [1]. Further labeling of the locations with activity-based information (work, shopping, leisure) is due to the source of the information in many cases difficult and inaccurate, if not impossible.

To analyze daily mobility behavior from such data sources, human mobility motifs were introduced [12]. A motif is a mathematical graph containing nodes (representing visited locations) and directed edges (representing the trips between them). A major difference to classical definitions of trip-chains is that the locations in motifs are unlabeled, which is due to the anonymous nature of the data source for which the motifs were developed. It should be noted, that motifs allow the researcher to study daily mobility behavior independent of trip length or travel time because this information is not encoded in a motif.

The data set used in this paper is from the German Mobility Panel (MOP) [17], which is a mobility survey with a rotating sample where one household is kept in the sample for three consecutive years. Each household’s member of the age of 10 or older is asked to record their mobility within one given week of each year in a mobility diary. From the data of the mobility diaries, the daily motifs are extracted. The data set contains information on the mobility of 15,864 individuals with a total of 230,769 recorded daily mobility patterns.

Figure 1 sketches how the motif is extracted from a given example mobility diary for one person on one day. Plot (a) shows a possible example of the contents of a mobility diary of one day, (b) abstracts the information from (a) into a movement graph whereby the different locations are symbolized as nodes of the graph and the directed edges are given by the trips from one location to another. Plot (c) leads to the motif by aggregating multiple edges, adding vertices for round tours and removing labels. Since only the purpose of the trip is given, only the end point of each trip can be directly inferred. Naturally, the end location of the previous trip is assumed to be the starting point of the next trip. If this is not given (for example, for the very first recorded trip), the starting point is assumed to be the “home” location.

Fig. 1
figure 1

Stylized example of the motif extraction process

Theoretically, there is a vast amount of possible motifs to choose from. Even when only considering motifs which allow the individual to return to a ‘home’ location, and limiting to not more than six visited locations within one day, there are already more than one million possible motifs to choose from. Reference [12] however found that the 17 most frequently chosen motifs are covering more than 90% of all observed daily motifs in their data set. This result has been confirmed in other studies for data sets from different cities and regions, where it has also been observed that the frequency of occurrence for each of the most frequent motifs is remarkably similar across the different studies (see [12, 13, 16]).

The same stability of motif distribution is also present in the MOP data set. Moreover, due to the long record of data over two decades, it is also possible to observe a temporal stability of the frequency of the motifs. This stability is shown in Fig. 2. For each of the 11 most common motifs in the MOP data set a time series is plotted showing the relative frequency of the motif for each year. It can be seen that the ordering of the seven most frequent motifs is identical over most of the 20-year time span. Moreover, their frequency is almost constant.

Fig. 2
figure 2

Frequency of the 11 most common motifs for all 20 years (1994–2013)

In addition to the data of the mobility diaries the MOP data set contains a wealth of explanatory variables on the personal and household level as well as weather data. This includes sex, age, employment, possession of a car, number of inhabitants in the household, net income, proximity to public transportation as well as various other variables (for more details on the data set see [17] and https://mobilitaetspanel.ifv.kit.edu). This allows modeling of the dependence of the motif choice on other socio-demographic characteristics as will be done in Sect. 5.

3 Data Structure for Activity-Based Multi-agent Simulation Models

ABMS models obtain simulations of the movements in the studied area based on a representation of agents including their mobility-related characteristics. In this paper, the agents are represented by a vector \(\theta \). Beside socio-demographics for the households and the single persons (age, sex, occupation for the individuals, number of cars, location, distance to transportation infrastructure, parking spaces, ...) the factors determining activity plans need to be represented. For MATSim, for example, this includes a list of potential activity locations as well as utility components associated with exercising the activities for certain time periods.

This is typical for ABMSs. The data characterizing the agents in general consists of a vector of variables such as for example \(\theta _c=(a_i,s_i,o_i,nc_i,locx_i,locy_i)\), where for person i the variables \(a_i\) denotes the age in years, \(s_i\) equals 1 for male and 2 for female, \(o_i\) encodes occupation status, \(nc_i\) the number of cars available and \(locx_i, locy_i\) encodes the home location. Additional information in the form of actual activities or activity sequences with or without start and end times are included. Here \(\theta _{act} = (a_{i,1},a-start_{i,1},a-end_{i,1}, \ldots )\) where \(a_{i,1}\) encodes the first activity for person i with start time \(a-start_{i,1}\) and end time \(a-end_{i,1}\). While the first part of the characteristics is relatively easy to sample from, the second part related to the activities in many cases is harder.

Therefore, oftentimes sampling from agents uses pre-calculated templates obtained, for example, from the individual level observations that are sampled from. This approach could be called nonparametric bootstrapping.

The alternative to this is parametric sampling where also for the second part a parametric model is developed from which sampling is possible. The advantage lies in the flexibility gained, the disadvantage in the potential to destroy dependencies.

Parametrization of activity-related information is necessarily specific for each ABMS model. MATSim, for example, uses a set of potential activity locations in combination with utility-related information as the characterization of the activity-related information. Correspondingly a parametric model must allow the sampling from this information based on parameters estimated from the individual-level data.

To this end the mobility motifs [12] discussed in Sect. 2 are helpful: They provide activity locations as nodes in the graph. Trips between the locations are marked as edges. The graph does not imply any timing information which hence can be obtained using scheduling capabilities of ABMS models. Labeling the locations such as “home”, “work”, “shop”, etc., links locations to activities. Destination choice models subsequently provide the spatial dimension of activities to the model. These labels can be inferred from socio-demographic characteristics (‘work’ only applies for persons with occupational status employed) with the help of choice models with a small number of choices.

Motifs do not specify the number of trips between activity locations which thus can either be inferred endogenously or endogenously using a discrete choice model with a small number of choices. All choice models can be based on mobility surveys and hence fall into the setting of this paper.

Thus, the activity plan of an agent can be characterized by providing the chosen motif, the labels for the nodes in the motif, their corresponding location and the number of trips between the motifs. For each of these choices a model also depending on socio-demographic characteristics allows to parametrically sample agents.

4 Population Synthesis

The process of population synthesis uses individual-level disaggregated data in combination with aggregated data on population characteristics for the scenarios to be simulated in order to generate a synthetic population of agents representing the population in the investigated region.

The aggregated data provides information on the marginal distribution of some variables (such as age, sex, household size) called control variables which the synthesized population should conform with. The traditional approach to population synthesis considers two stages [6]:

  1. 1.

    Fitting stage: Here the information on the relations between the control variables from the disaggregated data is combined with the marginals of the control variables from the aggregated data. As output of the fitting stage, a consolidated table is obtained that contains for each zone in the simulation the number of agents with specific characteristics corresponding to the control variables.

  2. 2.

    Zoning stage: In this stage for each zone and each cell in the table a number of agents specified by the output of the fitting stage are drawn randomly.

The most often used instrument in the fitting stage is iterative proportional fitting (IPF), see for instance [3]. The properties of IPF are widely understood, [8, 11]: Given a seed contingency table of the frequencies of combinations of categories for the control variables (assumed to be categorical) and marginal distributions the IPF algorithm allows to find the closest (in the sense of maximum entropy) contingency table to the seed table with the specified marginals.

Moreover, the algorithm used to achieve this is straightforward to code (see [11, (2.8) on p. 1163]). It has the drawback, that zero entries in the contingency table remain zero throughout. This is a disadvantage in particular for control variables with many categories or in situations with relatively small disaggregated data sets.

Additionally, it is not clear that maximum entropy solutions are desirable. In parts of the literature a different solution is favored, see e.g., [10, Sect. 9.5] on model transfer or [7, Chap. 8] on combining stated with revealed preference data.

For simplicity, consider as an example a multinomial logit (MNL) model explaining the household size y as a function of \(X = (a,s)\) (age and sex):

$$\begin{aligned} {\mathbb P}( y = j | X; \alpha ) = \frac{\exp ([\alpha ]_j + V_j(X)) }{\sum _{i=1}^J \exp ([\alpha ]_i+ V_i(X))} \end{aligned}$$

where \(V_i(X) = \beta _a a + \beta _s s\) denotes the systematic utility for alternative i and \([\alpha ]_i\) denotes the ith component of the vector \(\alpha \) of the alternative-specific constants (ASCs). In order to achieve identification we set \([\alpha ]_1=0\) as only utility differences are identified. Thus if \(p_{a,s}\) denotes the frequency of persons being a years old and of sex s in the target population, then the MNL model implies that

$$\begin{aligned} {\mathbb P}( y = j; \alpha ) = \sum _{a=1}^{100} \sum _{s=1}^2 {\mathbb P}( y = j | (a,s); \alpha )p_{a,s}, \quad j = 1, \ldots ,J. \end{aligned}$$

These predicted probabilities may be different from the marginal distribution \(p_j, j=1,\dots ,J\) of the household sizes in the target population while they will be identical in the sample used for estimating the parameters \(\alpha , \beta _s, \beta _a\). Reference [14] demonstrate that in this situation it might be preferential to adjust the ASCs \(\alpha \) in order to obtain the equality \({\mathbb P}( y = j; \hat{\alpha }) = p_j\). In the case of the MNL model direct adjustments exist, see e.g. [5, Eq. (2) on p. 429]. For other discrete choice models such as mixed MNL or probit models, one may minimize the function

$$\begin{aligned} Q(\tilde{\alpha }) = \sum _{j=1}^J ({\mathbb P}( y = j; \tilde{\alpha })-p_j)^2. \end{aligned}$$

Provided the function \({\mathbb P}( y = j; \tilde{\alpha })\) from \({\mathbb R}^{J-1}\) (as \([\alpha ]_1=0\)) to the interior of the J dimensional simplex of probability distributions is surjective, there exists a vector \(\hat{\alpha }\) such that \(Q(\hat{\alpha })=0\) for all choice frequencies \(p_j>0, \sum _{j=1}^J p_j =1\). Surjectivity for the MNL and the probit model is provided in [9]. The proof of Theorem 1 there is easily extended also for the mixed MNL model. Numerically any type of gradient method can be used to obtain the minimum.

Note that by adjusting the ASCs the preferences encoded in the systematic utility functions \(V_j(X)\) are unchanged. Thus, for example, the adjusted model corresponds to the same value-of-time coefficients. The adaptation is only used in order to adjust the predicted choice probabilities to equal (“match”) choice frequencies observed in the target population. This procedure is used to combine knowledge from disaggregated and aggregated data sources. Hereby, the MNL model is estimated using individual-level data for example obtained from stated preference surveys or conjoint studies. Subsequently the observed market shares according to the aggregated data sets are matched by adapting the ASCs, see e.g., [14].

While the usage of this matching procedure could be applied in the fitting stage to ensure the correct marginal distributions for the control variables, this paper focuses on the usage in the zoning stage. In this stage, a number of agents with characteristics \(\theta _c\) defined in the fitting stage are drawn randomly. The zoning stage then consists in drawing one specific mobility demand from a set of given demand vectors \(\theta _{act}\) for each individual.

The representation of the mobility demand for the individuals for the zoning stage typically includes information on

  • activities (with accompanying utility contributions)

  • activity locations

  • number of trips between locations

  • mode choice for trips (if not determined endogenously).

Some of this information can be endogenized, that is inferred from within the model such that agents autonomously decide on the travel mode, for example.

In this respect we suggest to model these choices anchored on the chosen motif: Based on different choice models built using the disaggregated data agent’s activity demand can be sampled corresponding to the procedure outlined in pseudocode Algorithm 1. Here drawing is achieved by simulating one choice according to the corresponding discrete choice model.

figure a

Except for the destination choice all these discrete choice models contain a rather small choice set. Moreover, some of the choice models correspond to aggregate information on which detailed information exists in a broader context: As an example, for the number of trips there exists detailed information in many different contexts. Thus, in this case, the choice models estimated using individual level data can be adjusted using the aggregated information to ensure that the predicted frequencies of the number of trips equals the observed frequencies of the number of trips for calibrating the model. This is outlined in the pseudocode Algorithm 2 where the vector of ASCs \(\alpha \) denotes the ones corresponding to the choice of the number of trips which could also be chosen endogenously within the simulation.

figure b

5 Determinants of Motif Choice

A central component in the synthesis described above is constituted by motif choice. As shown in the Sect. 2 there are almost no changes with respect to the relative frequencies of choosing the 11 most common motifs observed in the MOP over the course of 20 years. The observed persistence is even more remarkable given that some determinants which have been shown to impact motif choice in previous studies vary considerably over time (see [13] for the interaction of travel mode and motif choice). A prime example is the self-reported satisfaction with the access to public transport, which was also asked for in the MOP household questionnaire. It can be observed that the majority of participants is unsatisfied with their access to public transport at the beginning of the study but the share of dissatisfied respondents declines until the year 2000 and then stays stable at about 25%. Another example is the share of people working in the central business district of a large city which is steadily increasing over the years from 9.5 to 18.2% of the respondents.

In isolation, those observations do not provide any insight into the relationship of motif choice and the explanatory variables. In order to overcome the arbitrariness of bivariate analysis one has to rely on a discrete choice model to assess if the determinants of motif choice show the same persistence as the resulting choices.

The MOP data set contains a wealth of explanatory variables on the personal and household level as well as weather data (for more details on the data set see [17] and https://mobilitaetspanel.ifv.kit.edu). In this paper the Multinomial Logit (MNL) model is chosen to establish the mapping from those variables to the observed motif choices (\(Y_{nt}\) for person n on day t). It is important to acknowledge that MNL models are the simplest possible discrete choice models. The main weakness of this model class is the strict independence assumption with regard to the error terms across alternatives and across time. However, this allows to easily incorporate the previously chosen alternative as an explanatory variable (see [15, pp. 51f]).

The utility \(U_{nt}^j\) which decision maker n assigns to alternative j at choice occasion t is then modeled as

$$\begin{aligned}&U^{1}_{nt} = [\alpha ]_1 + V^{1}_{nt} + \varepsilon ^{1}_{nt} = [\alpha ]_1 + Y_{n(t-1)}\delta ^{1} + x_{n}\beta ^{1} + v_{nt}\gamma ^{1} + z_{t}\tau ^{1} + \varepsilon ^{1}_{nt},\\&\ldots \\&U^{J}_{nt} = [\alpha ]_J + V^{J}_{nt} + \varepsilon ^{J}_{nt} = [\alpha ]_J + Y_{n(t-1)}\delta ^{J} + x_{n}\beta ^{J} + v_{nt}\gamma ^{J} + z_{t}\tau ^{J} + \varepsilon ^{J}_{nt}, \end{aligned}$$

where V is called the representative utility and \(\varepsilon _{nt}^{j}\) follows a Gumbel-distribution. The utility function of the final model includes four different groups of variables:

  1. 1.

    The dummy-coded choice made on the previous day \(Y_{n(t-1)}\).

  2. 2.

    The calender effects \(z_t\), which do not depend on the alternative and are also not influenced by the individual traveler: dummy variables for Friday as well as for the weekend.

  3. 3.

    The weather effect \(v_{nt}\), which depends on the location of the decision maker and the date: the data set contains temperature (daily maximum) as well as precipitation (which is omitted due to many missing values).

  4. 4.

    The individual characteristics of the decision maker as well as the properties of the related household \(x_{n}\) which remain unchanged during the observation week. The person-specific variables are occupation status, gender, and age. In order to aid interpretation, four age groups are considered: under 18, 18–35, 36–60, and 61 and older. The household is described by the number of persons living in the same household as well as the number of persons living in the household under the age of 10. Finally, there is a set of variables related to the location of the household and potentially workplace of the decision maker.

As usual, maximizing the random utility in this setting leads to explicit formulas for the choice probabilities and thus the log-likelihood:

$$\begin{aligned} ll(\theta _V | V)&= \sum _{n=1}^N \sum _{t=1}^T \sum _{i=1}^{J} \mathbb {I}(y_{nt} = i) \log \frac{\exp ([\alpha ]_i + V^{i}_{nt})}{\sum _{j=1}^{J} \exp ([\alpha ]_j + V^{j}_{nt})}, \end{aligned}$$

where \(y_{nt} \in \{1 \dots J\}\) is the chosen alternative and \(\mathbb {I}(x)\) is the indicator function. We use the mlogit package in R (see https://cran.r-project.org/package=mlogit) to obtain the maximum likelihood estimates for the parameters (\(\alpha \), \(\beta \), \(\delta , \gamma \) and \(\tau \)) collected in the vector \(\theta _V\) for the model described by the index V for batches of data for 3 consecutive years.

We add the blocks of variables in incremental steps in order to highlight the contribution for each block of variables. As a baseline, the smallest MNL model which includes only Alternative-Specific Constant (ASCs) is estimated for each batch. This model is only needed to compute McFadden’s Pseudo \(R^2\) for the subsequent models.

Note that the coefficients for all variables are estimated separately for each alternative and that there are no variables which are alternative-specific other than the ASCs. In order to ensure identifiability, utility differences with respect to the “stay-at-home” motif are taken. Because each motif, by definition, starts in the home node the normalization using the “stay-at-home” alternative allows for the interpretation of the coefficients as the degrees of utility or disutility the decision maker experiences when a trip is started.

For the analysis, we use the 11 most common motifs as well as the category ‘other’ resulting in 12 choice alternatives (\(J = 12\)). The sample sizes of the first two years are lower than those of the consecutive years (1994: 517 participants and 1995: 744 participants). Those years are hence not included in the analysis. The discrete choice models are fitted separately to batches of 3 years. The largest data set (2011–2013) includes 33308 motif choices as seen in Table 1.

After each block of variables is added McFadden’s Pseudo \(R^2\) is computed as \(1 - ll( \hat{\theta }_V | V) / ll_0\) where \(\hat{\theta }_V\) denotes the maximum likelihood estimator for the model V and \(ll_0\) denotes the maximum of the likelihood for the ‘null’ model containing only the ASCs. This measure of fit ranging between 0 and 1 has a similar interpretation as the standard \(R^2\) in linear regression and hence aids model selection. Furthermore, the number of coefficients significant at the 0.05 level for each model is reported.

Table 1 Results of the MNL models from 1996 to 2013
Fig. 3
figure 3

Coefficient estimates for the number of children in the years 1996–2013. A solid square indicates that the coefficient is significant at the 0.05 level

The main results of the analysis are shown in Table 1, showing that even for the most sophisticated model the Pseudo \(R^2\) values are rather low. Among the explanatory variables, the lagged choice has the largest explanatory power as measured by the Pseudo \(R^2\). The number of significant variables of the various models is very similar over the years.

The inclusion of calendar effects increases the Pseudo \(R^2\) by around 0.006. This increase is almost exclusively driven by the dummy variable for the weekend, where all 11 coefficients are negative for all years. This indicates that on average more people would prefer not to leave the house on a weekend.

The addition of the temperature has no effect on motif choice as there is no improvement in the Pseudo \(R^2\). There are some negative coefficients for the two-location round-trip which are significant but this is not the case for every year. Therefore, temperature is not included in the subsequent models.

The addition of the first block of personal variables yields an increase in the Pseudo \(R^2\) which is between 0.005 and 0.007. There are some differences in the number of significant variables, mainly because gender and half-time occupation-related variables are only significant in models fitted to the later years, but no clear patterns. Adding variables describing the familial situation provides only a small \(R^2\) gain.

However, from Fig. 3 it is clearly visible that the impact of children changes over time. In the first batch, there are three motifs which are less likely (but insignificant) than staying at home if there are young children in the households, most prominently the two and three location round trips. This changes in later batches where the presence of children lets any motif be statistically significantly more favorable than staying at home. From the color of the lines, which correspond to those used earlier in Fig. 2, it is clear that the presence of children does primarily increase the probability for more complex motifs.

In summary, those results show that for the 20 years considered here there are no strong determinants for motif choice. This, in turn, means that different compositions of zones will only have a minor impact on the marginal distribution of the motifs. Note, however, that the MNL model employed here is a simple choice model and that more complex models, which for example include interaction terms and account for the panel structure of the data, might lead to more exploratory power.

6 Conclusions

The main message of this paper is that the motif choice is a promising element in the synthesis of populations for ABMS models: On the one hand, it contains many of the necessary features for parameterizing the activities of a person during a day. Adding the additional features then can be done using the usual discrete choice models.

On the other hand, the empirical analysis in this paper shows that the frequencies of the most often chosen motifs are remarkably stable over time in the German MOP data set. Moreover, the MNL models to explain motif choice show some dependence on observable characteristics such as the age and the household type, but also only a modest explanatory power indicating that using overall motif choice in place of more detailed models only leads to minor losses in accuracy.

It remains to be investigated whether the proposed approach centered on motif choice leads to practically useful simulations that provide similar levels of predictive abilities as models built using conventional approaches.