LML Estimation using the Faster-MM Algorithm

2.2 Iterative Optimization Methods to Estimate the Logit Mixed Logit

2.2.4 LML Estimation using the Faster-MM Algorithm

X r∈S hir(βir|ψm) = 1 =⇒ Bm_α = Bα = −1 2 N X i=1 _XT t=1 _XJ j=1 (xit jxTit j) − 1 J _XJ j=1 xit j _XJ j=1 xit j T . (2.21)

All the steps of the MM algorithm to estimate the LML model are given below:

2.2.4 LML Estimation using the Faster-MM Algorithm

It is important to note that the number of ‘alternatives’ in equation2.11(updating φ) that stems from a logit link is equal to the number of draws R (size of the estimation subset) for each agent from the support set S . For a good coverage of the parameter space, R should be large (for example, R= 2000).Böhning and Lindsay

(1988) highlight the fact that if the number of alternatives is large, the approximation of the Hessian in equation2.17becomes Bm

φ = Bφ = −12 PN

i=1 P

Algorithm 2:MM for the LML Model

Initialization

For each i, draw βir, r = 1, . . . , R (e.g., R = 2000), from the support set S ; Compute y(βir)using Sieve functions such as spline;

Compute B−1 φ using Eq.2.17 Compute B−1 α using Eq.2.21 Initialize parameters m= 0 : ψ0 = {α0, φ0_} while kψm+1₋_ψm_k ∞ < Tol. do

Step 1:Calculation of the weight [hir(βir|ψm)] ; Calculate Pit j(αm, βir)for each βirusing Eq. 2.2;

Calculate Li(αm, βir)for each i and for each βir using Eq.2.3; Calculate wi(βir|φm)for each i and for each βirusing Eq. 2.4; Calculate hir(βir|ψm)for each βirusing Eq. 2.7;

Step 2:Update parameters ; Compute gm

φ using Eq. 2.14and update φm+1using Eq. 2.13; Compute gm

αusing Eq. 2.19and update αm+1using Eq.2.18;

end

which is a very crude approximation. This observation is illustrated in the Ap- pendixBusing a sketch of the proof for the lower bound of hessian and also in a Monte Carlo study. Specifically,Böhning and Lindsay (1988) mentions the prob- lem of the curvature of the loglikelihood varying sharply as a function of initial values and direction. With a modified step-size, as suggested by the authors, for a broad family of statistical models, we propose the following simple algorithmic improvement to update φ:

Step 1:Compute the step size for MM: µm

φ = −[Bmφ]−1gmφ.

Step 2:Modify the step size: ζm

φ = ηmφµmφ.

Step 3:Update φ : φm+1 = φm+ ζm φ.

Intuitively, in this faster-MM algorithm the original step size µm

φ is augmented by a positive multiplier ηm

φ, and the modified step size ζφmis then used to update φ. The use of ζφ, instead of µm_φ, not only maintains monotonic improvements in the loglikelihood, but also ensures fast convergence of the MM algorithm for LML (see simulation results for LML in section2.5.1). This faster-MM method can be extended to improve the convergence rate of MM estimation of logit-type models with large choice sets in general:10 the MM algorithm for mixed logit as originally implemented by James (2017) is actually extremely slow if the number of alternatives is large and our proposed faster-MM algorithm can provide significant improvements (see section 2.5.2 for the simulation results). We also derive the faster-MM for MON-MNL (section2.3.4).

Computation of ηm φ

Böhning and Lindsay (1988) suggest writing a standard Taylor series expansion and then solving for the scalar multiplier. We derive the expression for ηm

φ exactly following those steps:

Q(φm+1) − Q(φm)= Q(φm+ ηm_φµm_φ) − Q(φm) ≥ ηm_φ(µm_φ)Tgm_φ + (η m φ)2(LB) 2 . (2.22) Solving for ηm φ: ηm φ = − (µm φ)Tgmφ LB , (2.23)

10_{For example, in the case of 4 alternatives, the multiplier η}m

φ can be of order 1.2, which is in-

where LB is a lower bound on the quadratic form of the Hessian that can be calculated as: LB= − N X i=1 X r∈S hir(βir|ψm)LBi_φ, where LBi

φ is a lower bound on the quadratic form of Hmφ,ir and is calculated as follows11_: LBi_φ= .5(m2i + M 2 i −.5((mi+ Mi)2)) where mi = min 1≤v≤R((µ m

φ)Ty(βiv)) and Mi = max 1≤v≤R((µ

φ)Ty(βiv)).

(2.24)

Recalling that the Hessian is: H_φm= ∂ 2_Q(φ|ψm ) ∂φ2 _φ=φm = − N X i=1 X r∈S hir(βir|ψ m )Hm_φ,ir, (2.25) and recognizing that P

r∈S hir(βir|ψm) = 1, it is possible to implement the lower bound as: LB= − N X i=1 LBi_φ. (2.26)

Note thatBöhning and Lindsay (1988) suggested to use LB = PiN=1LB i

φ in the original paper for specifications such as the Cox proportional hazards model. When we first implemented the bound without the negative sign, the MM algorithm lost monotonicity and was not converging. In fact, the loglikelihood was fluctuating randomly, instead of increasing at each iteration. We soon realized that monotonocity would be ensured if LB = − PNi=1LB

φ. With the corrected sign of LB, as shown in equation2.26, the MM algorithm not only converged but also 11_{See equation}_2.15_{and equations 5.5 - 5.7 of}_{Böhning and Lindsay}₍₁₉₈₈_{) for further informa-}

convergence was achieved much faster, as we were expecting. To make sure intu- ition about the sign of the bound was correct, we provide a formal proof below.

Checking the sign of LB.As mentioned above, irrespective of the chosen sample, the sign of ηm

φ must be positive to ensure monotonicity of the algorithm. We now show that LB = − PNi=1LB

φ (as opposed to the inverse LB = PNi=1LB i φ) fulfills this requirement. In effect, ηm φ = − (µm φ)Tgmφ LB = − (−[Bm φ]−1gmφ)Tgmφ LB = ( gm φ)T[Bmφ]−1gmφ LB . (2.27)

Consider first the numerator ( gm

φ)T[Bmφ]−1gmφ ≤ 0: since the objective function is concave, the Hessian is negative semi-definite and thus [Bm

φ]−1 is negative semi- definite.

Consider now the denominator LB. Since the objective function is concave, ∂2_Q(φ|ψm₎

∂φ2 (see Equation 2.25) is negative semi-definite. Additionally, hir(βir|ψm)is a positive weight and thus Hm

φ,ir is a positive semi-definite matrix. Since LBi is a lower bound on the quadratic form of Hm

φ,ir, it has to be non-negative. If LB = P

iLBi, LB ≥ 0 =⇒ ηm_φ ≤ 0, but if LB = − PiLBi, then LB ≤ 0 =⇒ ηm_φ ≥ 0 as needed. (Q.E.D.)

In document Improved Estimation of Flexible Logit Models and an Extension to a Model with a t-distributed Error Kernel (Page 53-58)