Keywords

JEL Classifications

Introduction

Daniel L. McFadden, the E. Morris Cox Professor of Economics at the University of California at Berkeley, was the 2000 co-recipient of the Nobel Prize in Economics, awarded ‘for his development of theory and methods of analyzing discrete choice’. (The prize was split with James J. Heckman, awarded ‘for his development of theory and methods for analyzing selective samples’). McFadden was born in North Carolina, USA, in 1937 and received a BS in physics from the University of Minnesota (with highest honors) in 1956, and a Ph.D. in economics from Minnesota in 1962. His academic career began as a postdoctoral fellow at the University of Pittsburgh. In 1963 he was appointed as assistant professor of economics at the University of California at Berkeley, and tenured in 1966. He has also held tenured appointments at Yale University (as Irving Fisher Research Professor in 1977), and at the Massachusetts Institute of Technology (from 1978 to 1991). In 1990 he was awarded the E. Morris Cox Chair at the University of California at Berkeley, where he has also served as Department Chair and as Director of the Econometrics Laboratory.

Research Contributions

McFadden is best known for his fundamental contributions to the theory and econometric methods for analysing discrete choice. Building on a highly abstract, axiomatic literature on probabilistic choice theory due to Thurstone (1927), Block and Marschak (1960), and Luce (1959), McFadden developed the econometric methodology for estimating the utility functions underlying probabilistic choice theory. McFadden’s primary contribution was to provide the econometric tools that permitted widespread practical empirical application of discrete choice models, in economics and other disciplines. According to his autobiography (McFadden 2001),

In 1964, I was working with a graduate student, Phoebe Cottingham, who had data on freeway routing decisions of the California Department of Transportation, and was looking for a way to analyze these data to study institutional decision-making behavior. I worked out for her an econometric model based on an axiomatic theory of choice behavior developed by the psychologist Duncan Luce. Drawing upon the work of Thurstone and Marshak, I was able to show how this model linked to the economic theory of choice behavior. These developments, now called the multinomial logit model and the random utility model for choice behavior, have turned out to be widely useful in economics and other social sciences. They are used, for example, to study travel modes, choice of occupation, brand of automobile purchase, and decisions on marriage and number of children.

Thousands of papers applying his technique have been published since his path-breaking papers, ‘Conditional Logit Analysis of Qualitative Choice Behavior’ (1973) and ‘The Revealed Preferences of a Government Bureaucracy: Empirical Evidence’ (1976). In December 2005, a search of the term ‘discrete choice’ using the Google search engine yielded 10,200,000 entries, and a search on the Google Scholar search engine (which limits search to academic articles) returned 759,000 items.

Besides the discrete choice literature itself, McFadden’s work has spawned a number of related literatures in econometrics, theory, and industrial organization that are among the most active and productive parts of the economic literature in the present day. This includes work in game theory and industrial organization (for example, the work on discrete choice and product differentiation of Anderson et al. (1992), estimation of discrete games of incomplete information (Bajari et al. 2005), and discrete choice modelling in the empirical industrial organization literature (Berry et al. 1995, and Goldberg 1995), the econometric literature on semiparametric estimation of discrete choice models (Manski 1985; McFadden and Train 2000), the literature on discrete/continuous choice models and its connection to durable goods and energy demand modelling (Dagsvik 1994; Dubin and McFadden 1984; Hannemann 1984), the econometric literature on choice based and stratified sampling (Cosslett 1981; Manski and McFadden 1981), the econometric literature on ‘simulation estimation’ (McFadden 1994; Hajivassiliou and Ruud 1994; Pakes and Pollard 1989), and the work on structural estimation of dynamic discrete choice models and extensions thereof (Dagsvik 1983; Eckstein and Wolpin 1989; Heckman 1981; Rust 1994).

McFadden has also made significant contributions to other fields, particularly to economic theory and production economics. Due to space limitations, I can only briefly mention several of his best known contributions here. McFadden’s earliest published work was in pure theory, including seminal work on duality theory of production functions that was subsequently published in his book on Production Economics edited with Melvyn Fuss in 1978. McFadden made important contributions to growth theory including his 1967 Review of Economic Studies paper that showed how the overtaking criterion could be used to evaluate infinite horizon development programmes, resolving an outstanding paradox raised by Diamond and Koopmans. In a series of papers with Mitra and Majumdar (1976, 1980), McFadden extended the classical competitive equilibrium welfare theorems established by Debreu and others in finite economies, (that is, competitive equilibria are Pareto efficient, and any Pareto efficient allocation can be sustained as a competitive equilibrium after a suitable reallocation of resources), to infinite horizon economies. This work was not a simple technical extension or previous work by Debreu: it resolved serious conceptual problems created by the fact that in an infinite horizon economy (which includes standard overlapping generations models) the commodity space is infinite-dimensional and the number of consumers is infinite. These papers provided sufficient conditions for the existence of these fundamental welfare theorems, resolving paradoxes raised by Paul Samuelson, who showed special cases of infinite horizon overlapping generation economies where competitive equilibria can be strikingly inefficient. Another landmark paper is McFadden’s (1974) paper on excess demand functions with Mantel, Mas-Colell and Richter. This paper provided one of the most general proofs of a classic conjecture by Hugo Sonnenschein that the necessary and sufficient properties of any system of aggregate excess demand functions are that it satisfy the following three properties: (1) homogeneity, (2) continuity, and (3) Walras’s Law. McFadden has made numerous other contributions to economic theory that I do not have space to cover here.

Instead, I now return to a more in depth review of McFadden’s contributions to the discrete choice literature, the primary contributions that were cited in his Nobel Prize award.

Contributions to Discrete Choice

McFadden’s contributions built on prior work in the literature on mathematical psychology (see logit models of individual choice for further details). McFadden’s contribution to this literature was to recognize how to operationalize the random utility interpretation in an empirically tractable way. In particular, he provided the first a random utility interpretation of the multinomial logit (MNL) model. His other fundamental contribution was to solve an analogue of the revealed preference problem: that is, using data on the actual choices and states of a sample of agents \( {\left\{\left({d}_i,{x}_i\right)\right\}}_{i=1}^N \), he showed how it was possible to ‘reconstruct’ their underlying random utility functions via the method of maximum likelihood, where the likelihood is a product of individuals’ conditional choice probabilities. Given the simplicity of the MNL choice probabilities, this worked helped to spawn a huge empirical literature that applied discrete choice models to a wide variety of phenomena. Further, McFadden introduced a new class of multivariate distributions, the generalized extreme value family (GEV), and derived tractable formulas for the implied choice probabilities including the nested multinomial logit model, and showed that these models relax some of the empirically implausible restrictions implied by the multinomial logit model, particularly the independence from irrelevant alternatives (IIA) property.

Multivariate Extreme Value Distributions and the Multinomial Logit Model

McFadden assumed that an individual’s utility function has the following additive separable representation

$$ U\left(x,z,d,\theta \right)=u\left(x,d,\theta \right)+v\left(z,d\right). $$
(1)

Define ε (d) ≡ v (z, d ). It follows that an assumption on the distribution of the random vector z implies a distribution for the random vector ε ≡ {ε(d)|dD(x)}. McFadden’s approach was to make assumptions directly about the distribution of ε, rather than making assumptions about the distribution of z and deriving the implied distribution of ε. Standard assumptions for the distribution of ε that have been considered include the multivariate normal which yields the multivariate probit variant of the discrete choice model. Unfortunately, in problems where there are more than only two alternatives (the case that Thurstone studied), the multinomial probit model becomes intractable in higher dimensional problems. The reason is that, in order to derive the conditional choice probabilities, one must do numerical integrations that have a dimension equal to |D(x)|, the number of elements in the choice set. In general this multivariate integration is computationally infeasible when |D(x)| is larger than 5 or 6, using standard quadrature methods on modern computers.

McFadden introduced an alternative assumption for the distribution of ε, namely the multivariate extreme value distribution given by

$$ {\displaystyle \begin{array}{ll}F\left(z|x\right)& =\mathit{\Pr}\left\{{\varepsilon}_d\le {z}_d|d\in D(x)\right\}\hfill \\ {}& =\ \ \ \ \ \prod \limits_{d\ \ \ \in \ \ \ D(x)}\exp \left\{-\exp \left\{-\left({z}_d-{\mu}_d\right)/\sigma \Big)\right\}\right\},\hfill \end{array}} $$
(2)

and showed that (when the location parameters μd are normalized to) the corresponding random utility model produces choice probabilities given by the multinomial logit formula

$$ P\left(d|x,\theta \right)=\frac{\exp \left\{u\left(x,\ \ \ d,\ \ \ \theta \right)/\sigma \right\}}{\sum_{d^{\prime}\in D(x)}\exp \left\{u\left(x,\ \ \ {d}^{\prime },\ \ \ \theta \right)/\sigma \right\}}. $$

This is McFadden’s key result, that is, the MNL choice probability is implied by a random utility model when the random utilities have extreme value distributions. It leads to the insight that the independence from irrelevant alternatives (IIA) property of the MNL model is a consequence of the statistical independence in the random utilities. In particular, even if the observed attributes of two alternatives d and d′ are identical (which implies u(x, d, θ ) = u(x, d′,θ)), the statistical independence of unobservable components ε(d) and ε(d′) implies alternatives d and d′ are not perfect substitutes even when their observed characteristics are identical. In many cases this is not problematic: individuals may have different idiosyncratic perceptions and preferences for two different items that have the same observed attributes. However in the case of the ‘red bus/blue bus’ example or the concert ticket example discussed by Debreu (1960), there are cases where it is plausible to believe that the observed attributes provide a sufficiently good description of an agent’s perception of the desirability of two alternatives. In such cases, the hypothesis that choices are also affected by additive, independent unobservables ε(d) provides a poor representation of an agent’s decisions. What is required in such cases is a random utility model that has the property that the degree of correlation in the unobserved components of utility ε(d) and ε(d′) for two alternatives d, d′ ∈ D (x) is a function of the degree of closeness in the observed attributes. This type of dependence can be captured by a random coefficient probit model. This is a random utility model of the form U(x, z, d, θ) = xd(θ + z) where xd is a k × 1 vector of observed attributes of alternative d, and θ is a k × 1 vector of utility weights representing the mean weights individuals assign to the various attributes in xd in the population and z~ N(0,Ω) is a k × 1 normally distributed random vector representing agent specific deviations in their weighting of the attributes relative the population average values, θ. Under the random coefficients probit specification of the random utility model, when \( {x}_d={x}_{d^{\prime }} \), alternatives d and d′ are in fact perfect substitutes for each other and this model is able to provide the intuitively plausible prediction of the effect of introducing an irrelevant alternative – the red bus – in the red bus/blue bus problem (see, for example, Hausman and Wise 1978).

Generalized Extreme Value Distributions and Nested Logit Models

McFadden (1981) introduced the generalized extreme value (GEV) family of distributions. This family relaxes the independence assumption of the extreme value specification while still yielding tractable expressions for choice probabilities. The GEV distribution is given by

$$ {\displaystyle \begin{array}{l}F\left(z|x\right)=\mathit{\Pr}\left\{{\varepsilon}_d\le {z}_d|d\in D(x)\right\}\\ {}=\exp \left\{-H\left(\exp \left\{-{z}_1\right\},\dots, \exp \left\{-{z}_{\mid D(x)\mid}\right\},x,D(x)\right)\right\},\hfill \end{array}} $$

for any function H(z, x, D(x)) satisfying certain consistency properties. McFadden showed that choice probabilities for the GEV distribution are given by

$$ \ \ \ P\left(d|x,\theta \right)=\frac{\exp \left\{u\left(x,d,\theta \right)\right\}{H}_d\left(\exp \left\{u\left(x,\ \ \ 1,\ \ \ \theta \right)\right\},\dots, \exp \left\{u\left(x,|D(x)|,\theta \right)\right\},x,D(x)\right)}{H\left(\exp \left\{u\left(x,1,\theta \right)\right\},\dots, \exp \left\{u\left(x,\ \ \ |D(x)|,\theta \right)\right\},x,D(x)\right),} $$

where Hd(z, x, D(x)) = ∂/∂ZdH(z, x, D(x)). A prominent subclass of GEV distributions is given by H functions of the form

$$ H\left(z,y,D(x)\right)=\sum_{i=1}^n{\left[\sum_{d\in {D}_i(x)}{z}_d^{1/{\sigma}_i}\right]}^{\sigma_i}, $$

where {D1(x), … ,Dn(x)} is a partition of the full choice set D(x). This subclass of GEV distributions yields the nested multinomial logit (NMNL) choice probabilities (see logit models of individual choice for further details).

The NMNL model has been applied in numerous empirical studies especially to study demand where there are an extremely large number of alternatives, such as modelling consumer choice of automobiles (for example, Berkovec 1985; Goldberg 1995). In many of these consumer choice problems there is a natural partitioning of the choice set in terms of product classes (for example, luxury, compact, intermediate, sport-utility, and so on, classes in the case of autos). The nesting avoids the problems with the IIA property and results in more reasonable implied estimates of demand elasticities than those obtained using the MNL model. In fact, Dagsvik (1995) has shown that the class of random utility models with GEV distributed utilities is ‘dense’ in the class of all random utility models, in the sense that choice probabilities implied from any random utility model can be approximated arbitrarily closely by a random utility model in the GEV class. However a limitation of nested logit models is that they imply a highly structured pattern of correlation in the unobservables induced by the econometrician’s specification of how the overall choice set D(x) is to be partitioned, and the number of levels in the nested logit ‘tree’. Even though the NMNL model can be nested to arbitrarily many levels to achieve additional flexibility, it is desirable to have a method where patterns of correlation in unobservables can be estimated from the data rather than being imposed by the analyst. Further, even though McFadden and Train (2000) recognize Dagsvik’s (1995) finding as a ‘powerful theoretical result’, they conclude that ‘its practical econometric application is limited by the difficulty of specifying, estimating and testing the consistency of relatively abstract generalized Extreme Value RUM’ (McFadden and Train 2000, p. 452).

Method of Simulated Moments and Simulation Based Inference for Discrete Choice

As noted above, the random coefficients probit model has many attractive features: it allows a flexibly specified covariance matrix representing correlation between unobservable components of utilities that avoid many of the undesirable features implied by the IIA property of the MNL model, in a somewhat more direct and intuitive fashion than is possible via the GEV family. However as noted above, the multinomial probit model is intractable for applications with more than four or five alternatives due to the ‘curse of dimensionality’ of the numerical integrations required, at least using deterministic numerical integration methods such as Gaussian quadrature. One of McFadden’s most important contributions was his (1989) Econometrica paper that introduced the method of simulated moments (MSM). This was a major breakthrough that introduced a new econometric method that made it feasible to estimate the parameters of multinomial probit models with arbitrarily large numbers of alternatives.

The basic idea underlying McFadden’s contribution is to use Monte Carlo integration to approximate the probit choice probabilities. While this idea had been previously proposed by others, it was never developed into a practical, widespread estimation method because ‘it requires an impractical number of Monte Carlo draws to estimate small choice probabilities and their derivatives with acceptable precision’ (McFadden 1989, p. 997). However McFadden’s insight was that it is not necessary to have extremely accurate (and thus very computationally time-intensive) Monte Carlo estimates of choice probabilities in order to obtain an estimator for the parameters of a multinomial probit model that is consistent and asymptotically normal and performs well in finite samples. McFadden’s insight is that the noise from Monte Carlo simulations can be treated in the same way as random sampling error and will thus ‘average out’ in large samples. In particular, his MSM estimator has good asymptotic properties even when only a single Monte Carlo draw is used to estimate each agent’s choice probability. See simulation-based estimation for further details on the MSM estimator.

The idea behind the MSM estimator is quite general and can be applied in many other settings besides the multinomial probit model. McFadden’s work helped to spawn a large literature on ‘simulation estimation’ that developed rapidly during the 1990s and resulted in computationally feasible estimators for a large new class of econometric models that were previously considered to be computationally infeasible. However, there are even better simulation estimators for the multinomial probit model, which generally outperform the MSM estimator in terms of having lower asymptotic variance and better finite sample performance, and which are easier to compute. One problem with the simple Monte Carlo estimator \( \widehat{P}\left({x}_i,\theta \right) \) underlying the MSM estimator is that it is a discontinuous and ‘locally flat’ function of the parameters θ, and thus the MSM criterion function is difficult to optimize. Hajivassiliou and McFadden (1998) introduced the method of simulated scores (MSS) that is based on Monte Carlo methods for simulating the scores of the likelihood function for a multinomial probit model and a wide class of other limited dependent variable models such as Tobit and other types of censored regression models. (In the case of a discrete choice model, the score for the ith observation is ∂/∂θ log(P(di|xi, θ)).) Because it simulates the score of the likelihood rather than using a method of moments criterion, the MSS estimator is more efficient than the MSM estimator. Also, the MSS is based on a smooth simulator (that is, a method of simulation that results in an estimation criterion that is a continuously differentiable function of the parameters θ), so the MSS estimator is much easier to compute than the MSM estimator. Based on numerous Monte Carlo studies and empirical applications, MSS (and a closely related simulated maximum likelihood estimator based on the Geweke–Hajivassiliou–Keane’, GHK, smoother simulator) are now regarded as the estimation methods of choice for a wide class of econometric models with limited dependent variable that are commonly encountered in empirical applications (see simulation-based estimation for further details).

Mixed Logit Models

A mixed MNL model has choice probabilities of the form

$$ {\displaystyle \begin{array}{ll}\hfill & P\left(d|x,\theta \right)\\ {}& =\int \left[\exp \left\{\frac{u\left(x,d,\alpha \right)}{\sum_{d^{\prime}\ \ \ \in \ \ \ D(x)}\exp \left(u\left(x,{d}^{\prime },\alpha \right)\right)}\right\}G\left( d\alpha |\theta \right).\right]\hfill \end{array}} $$
(3)

There are several possible random utility interpretations of the mixed logit model. One interpretation is that the α vector represents ‘unobserved heterogeneity’ in the preference parameters in the population, so the relevant choice probability is marginalized using the population distribution for the α parameters in the population, G(α |θ). The other interpretation is that a is similar to vector ε, that is, it represents information that agents observe and which affects their choices (similar to e) but which is unobserved by the econometrician, except that the components of ε, ε (d) enter the utility function additively separably, whereas the variables a are allowed to enter in a non-additively separable fashion and the random vectors α and ε are statistically independent. It is easy to see that, under either interpretation, the mixed logit model will not satisfy the IIA property, and thus is not subject to its undesirable implications. McFadden and Train proposed several alternative ways to estimate mixed logit models, including maximum simulated likelihood and MSM. In each case, Monte Carlo integration is used to approximate the integral in Eq. 3 with respect to G(α|θ). Both of these estimators are smooth functions of the parameters θ, and both benefit from the computational tractability of the MNL while at the same time having the flexibility to approximate virtually any type of random utility model. The intuition behind McFadden and Train’s approximation theorem is that a mixed logit model can be regarded as a certain type of neural network using the MNL model as the underlying ‘squashing function’. Neural networks are known to have the ability to approximate arbitrary types of functions and enjoy certain optimality properties, that is, the number of parameters (that is, the dimension of the α vector) needed to approximate arbitrary choice probabilities grows only linearly in the number of included covariates x. (Other approximation methods, such as series estimators formed as tensor products of bases that are univariate functions of each of the components of x, require a much larger number of coefficients to provide an comparable approximation, and the number of such coefficients grows exponentially fast with the dimension of the x vector.)

Conclusion

This brief survey of McFadden’s contributions to the discrete choice literature has revealed the immense practical benefits of his ability to link theory and econometrics, innovations that lead to a vast empirical literature and widespread applications of discrete choice models. Beginning with his initial discovery, that is, his demonstration that multinomial logit choice probabilities result from a random utility model with multivariate extreme value distributed unobservables, McFadden has made a series of fundamental contributions that have enabled researchers to circumvent the problematic implications of the IIA property of the MNL model, providing computationally tractable methods for estimating ever wider and more flexible classes of random utility and limited dependent-variable models in econometrics.

See Also

Selected Works

  • 1967. The evaluation of development programmes. Review of Economic Studies 34: 25–50.

  • 1973. Conditional logit analysis of qualitative choice behavior. In Frontiers of Econometrics, ed. P. Zarembka. New York: Academic Press.

  • 1974. The measurement of urban travel demand. Journal of Public Economics 3: 303–328.

  • 1974. (With R. Mantel, A. Mas-Colell, and M.K. Richter.) A characterization of community excess demand functions. Journal of Economic Theory 9: 361–374.

  • 1976. The revealed preferences of a government bureaucracy: Empirical evidence. Bell Journal of Economics and Management Science 7: 55–72.

  • 1976. (With M. Majumdar and T. Mitra.) On efficiency and Pareto optimality of competitive programs in closed multisector models. Journal of Economic Theory 13: 26–46.

  • 1978. Cost, revenue, and profit functions. In Production economics: A dual approach to theory and applications, vol. 1, ed. M. Fuss and D. McFadden. Amsterdam: North-Holland.

  • 1980. (With T. Mitra and M. Majumdar, M.) Pareto optimality and competitive equilibrium in infinite horizon economies. Journal of Mathematical Economics 7: 1–26.

  • 1981. Econometric models of probabilistic choice. In Structural analysis of discrete data with econometric applications, ed. C.F. Manski and D. McFadden. Cambridge, MA: MIT Press.

  • 1981. (With C.F. Manski, ed.) Structural analysis of discrete data with econometric applications. Cambridge, MA: MIT Press.

  • 1984. Econometric models of qualitative response models. In Handbook of Econometrics, vol. 2, ed. Z. Griliches and M. Intriligator. Amsterdam: North-Holland.

  • 1989. A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica 57: 995–1026.

  • 1994. (With P. Ruud.) Estimation by simulation. Review of Economics and Statistics 76: 591–608.

  • 2000. (With K. Train.) Mixed MNL models of discrete response. Journal of Applied Econometrics 15: 447–470.

  • 2001. Autobiography. In Les Prix Nobel. The Nobel Prizes 2000, ed. T. Frängsmyr. Stockholm: Nobel Foundation.