Logit Train, Kenneth E.
Discrete Choice Methods with Simulation,
01/2001
Book Chapter
Choice ProbabilitiesBy far the easiest and most widely used discrete choice model is logit. Its popularity is due to the fact that the formula for the choice probabilities takes a closed form and is ...readily interpretable. Originally, the logit formula was derived by Luce (1959) from assumptions about the characteristics of choice probabilities, namely the independence from irrelevant alternatives (IIA) property discussed in Section 3.3.2. Marschak (1960) showed that these axioms implied that the model is consistent with utility maximization. The relation of the logit formula to the distribution of unobserved utility (as opposed to the characteristics of choice probabilities) was developed by Marley, as cited by Luce and Suppes (1965), who showed that the extreme value distribution leads to the logit formula. McFadden (1974) completed the analysis by showing the converse: that the logit formula for the choice probabilities necessarily implies that unobserved utility is distributed extreme value. In his Nobel lecture, McFadden (2001) provides a fascinating history of the development of this path-breaking model.To derive the logit model, we use the general notation from Chapter 2 and add a specific distribution for unobserved utility. A decision maker, labeled n, faces J alternatives. The utility that the decision maker obtains from alternative j is decomposed into (1) a part labeled Vnj that is known by the researcher up to some parameters, and (2) an unknown part εnj that is treated by the researcher as random: Unj = Vnj + εnj ∀j.
Probit Train, Kenneth E.
Discrete Choice Methods with Simulation,
01/2001
Book Chapter
Choice ProbabilitiesThe logit model is limited in three important ways. It cannot represent random taste variation. It exhibits restrictive substitution patterns due to the IIA property. And it ...cannot be used with panel data when unobserved factors are correlated over time for each decision maker. GEV models relax the second of these restrictions, but not the other two. Probit models deal with all three. They can handle random taste variation, they allow any pattern of substitution, and they are applicable to panel data with temporally correlated errors.The only limitation of probit models is that they require normal distributions for all unobserved components of utility. In many, perhaps most situations, normal distributions provide an adequate representation of the random components. However, in some situations, normal distributions are inappropriate and can lead to perverse forecasts. A prominent example relates to price coefficients. For a probit model with random taste variation, the coefficient of price is assumed to be normally distributed in the population. Since the normal distribution has density on both sides of zero, the model necessarily implies that some people have a positive price coefficient. The use of a distribution that has density only on one side of zero, such as the lognormal, is more appropriate and yet cannot be accommodated within probit. Other than this restriction, the probit model is quite general.
GEV Train, Kenneth E.
Discrete Choice Methods with Simulation,
01/2001
Book Chapter
IntroductionThe standard logit model exhibits independence from irrelevant alternatives (IIA), which implies proportional substitution across alternatives. As we discussed in Chapter 3, this property ...can be seen either as a restriction imposed by the model or as the natural outcome of a well-specified model that captures all sources of correlation over alternatives into representative utility, so that only white noise remains. Often the researcher is unable to capture all sources of correlation explicitly, so that the unobserved portions of utility are correlated and IIA does not hold. In these cases, a more general model than standard logit is needed.Generalized extreme value (GEV) models constitute a large class of models that exhibit a variety of substitution patterns. The unifying attribute of these models is that the unobserved portions of utility for all alternatives are jointly distributed as a generalized extreme value. This distribution allows for correlations over alternatives and, as its name implies, is a generalization of the univariate extreme value distribution that is used for standard logit models. When all correlations are zero, the GEV distribution becomes the product of independent extreme value distributions and the GEV model becomes standard logit. The class therefore includes logit but also includes a variety of other models. Hypothesis tests on the correlations within a GEV model can be used to examine whether the correlations are zero, which is equivalent to testing whether standard logit provides an accurate representation of the substitution patterns.
Bayesian Procedures Train, Kenneth E.
Discrete Choice Methods with Simulation,
01/2001
Book Chapter
IntroductionA powerful set of procedures for estimating discrete choice models has been developed within the Bayesian tradition. The breakthough concepts were introduced by Albert and Chib (1993) and ...McCulloch and Rossi (1994) in the context of probit, and by Allenby and Lenk (1994) and Allenby (1997) for mixed logits with normally distributed coefficients. These authors showed how the parameters of the model can be estimated without needing to calculate the choice probabilities. Their procedures provide an alternative to the classical estimation methods described in Chapter 10. Rossi et al. (1996), Allenby (1997), and Allenby and Rossi (1999) showed how the procedures can also be used to obtain information on individual-level parameters within a model with random taste variation. By this means, they provide a Bayesian analog to the classical procedures that we describe in Chapter 11. Variations of these procedures to accommodate other aspects of behavior have been numerous. For example, Arora et al. (1998) generalized the mixed logit procedure to take account of the quantity of purchases as well as brand choice in each purchase occasion. Bradlow and Fader (2001) showed how similar methods can be used to examine rankings data at an aggregate level rather than choice data at the individual level. Chib and Greenberg (1998) and Wang et al. (2002) developed methods for interrelated discrete responses. Chiang et al. (1999) examined situations where the choice set that the decision maker considers is unknown to the researcher.
Variations on a Theme Train, Kenneth E.
Discrete Choice Methods with Simulation,
01/2001
Book Chapter
IntroductionSimulation gives the researcher the freedom to specify models that appropriately represent the choice situations under consideration, without being unduly hampered by purely mathematical ...concerns. This perspective has been the overarching theme of our book. The discrete choice models that we have discussed – namely, logit, nested logit, probit, and mixed logit – are used in the vast majority of applied work. However, readers should not feel themselves constrained to use these models. In the current chapter, we describe several models that are derived under somewhat different behavioral concepts. These models are variations on the ones already discussed, directed toward specific issues and data. The point is not simply to describe additional models. Rather, the discussion illustrates how the researcher might examine a choice situation and develop a model and estimation procedure that seem appropriate for that particular situation, drawing from, and yet adapting, the standard set of models and tools.Each section of this chapter is motivated by a type of data, representing the outcome of a particular choice process. The arena in which such data might arise is described, and the limitations of the primary models for these data are identified. In each case, a new model is described that better represents the choice situation. Often this new model is only a slight change from one of the primary models. However, the slight change will often make the standard software unusable, so that the researcher will need to develop her own software, perhaps by modifying the codes that are available for standard models.
MotivationSo far we have examined how to simulate choice probabilities but have not investigated the properties of the parameter estimators that are based on these simulated probabilities. In the ...applications we have presented, we simply inserted the simulated probabilities into the log-likelihood function and maximized this function, the same as if the probabilities were exact. This procedure seems intuitively reasonable. However, we have not actually shown, at least so far, that the resulting estimator has any desirable properties, such as consistency, asymptotic normality, or efficiency. We have also not explored the possibility that other forms of estimation might perhaps be preferable when simulation is used rather than exact probabilities.The purpose of this chapter is to examine various methods of estimation in the context of simulation. We derive the properties of these estimators and show the conditions under which each estimator is consistent and asymptotically equivalent to the estimator that would arise with exact values rather than simulation. These conditions provide guidance to the researcher on how the simulation needs to be performed to obtain desirable properties of the resultant estimator. The analysis also illuminates the advantages and limitations of each form of estimation, thereby facilitating the researcher's choice among methods.
Mixed Logit Train, Kenneth E.
Discrete Choice Methods with Simulation,
01/2001
Book Chapter
Choice ProbabilitiesMixed logit is a highly flexible model that can approximate any random utility model (McFadden & Train, 2000). It obviates the three limitations of standard logit by allowing for ...random taste variation, unrestricted substitution patterns, and correlation in unobserved factors over time. Unlike probit, it is not restricted to normal distributions. Its derivation is straightforward, and simulation of its choice probabilities is computationally simple.Like probit, the mixed logit model has been known for many years but has only become fully applicable since the advent of simulation. The first application of mixed logit was apparently the automobile demand models created jointly by Boyd & Mellman (1980) and Cardell & Dunbar (1980). In these studies, the explanatory variables did not vary over decision makers, and the observed dependent variable was market shares rather than individual customers' choices. As a result, the computationally intensive integration that is inherent in mixed logit (as explained later) needed to be performed only once for the market as a whole, rather than for each decision maker in a sample. Early applications on customer-level data, such as Train et al. (1987a) and Ben-Akiva et al. (1993), included only one or two dimensions of integration, which could be calculated by quadrature. Improvements in computer speed and in our understanding of simulation methods have allowed the full power of mixed logits to be utilized. Among the studies to evidence this power are those by Bhat (1998a) and Brownstone & Train (1999) on cross-sectional data, and Erdem (1996), Revelt & Train (1998), and Bhat (2000) on panel data.
EM Algorithms Train, Kenneth E.
Discrete Choice Methods with Simulation,
01/2001
Book Chapter
IntroductionIn Chapter 8, we discussed methods for maximizing the log-likelihood (LL) function. As models become more complex, maximization by these methods becomes more difficult. Several issues ...contribute to the difficulty. First, greater flexibility and realism in a model are usually attained by increasing the number of parameters. However, the procedures in Chapter 8 require that the gradient be calculated with respect to each parameter, which becomes increasingly time consuming as the number of parameters rises. The Hessian, or approximate Hessian, must be calculated and inverted; with a large number of parameters, the inversion can be numerically difficult. Also, as the number of parameters grows, the search for the maximizing values is over a larger-dimensioned space, such that locating the maximum requires more iterations. In short, each iteration takes longer and more iterations are required.Second, the LL function for simple models is often approximately quadratic, such that the procedures in Chapter 8 operate effectively. As the model becomes more complex, however, the LL function usually becomes less like a quadratic, at least in some regions of the parameter space. This issue can manifest itself in two ways. The iterative procedure can get “stuck” in the nonquadratic areas of the LL function, taking tiny steps without much improvement in the LL. Or the procedure can repeatedly “bounce over” the maximum, taking large steps in each iteration but without being able to locate the maximum.