StructuralEstimationNov2015.pdf

(1)

Doctoral seminar in resource economics

Structural estimation and applications

to natural resources

Bruno Lanz

Graduate Institute of International and Development Studies

(2)

Outline for the class

Structural vs. atheoretical approaches to estimation:

Example from Keane (J Metrics, 2010): Lottery draft and lifetime earnings

Structural estimation: Discrete choice

Theory: Estimation of random utility models

Application: Spatial sorting (Bayer et al., JEEM 2009)

Estimation of dynamic discrete choice models

Basic framework: Classic paper by Rust (Ecta 1987) Policy-orientated applications (next week):

Land use choices (Scott, 2014)

(3)

What is structural estimation?

Start from a standard model: Optimizing agents with fixed preferences / technology

Objective: Estimate the structural parameters (primitives) of the model

Rationalize the data as the model’s outcome

“As if” the data had been generated by the model

A structural error term allows real-world observations to deviate from optimal behavior determined by the model’s solution

(4)

Why do we care?

Some behavioral parameters may be of interest in their own right (e.g. an elasticity of substitution of a CES production function)

But what makes the approach attractive is the ability to exploit the estimated model

Once the parameters of the model are estimated the model can be solved to make behavioral predictions and study counterfactual equilibria

Either out of sample (no change in the structure of the problem) Or introduce a new component to the problem (e.g. a change in relative prices) and see how equilibrium differs from observed outcomes

In some settings it can be used to assess the welfare impact of policy interventions

(5)

Experimental view of estimation

With the rise of controlled (laboratory) experiments in the late 80s, there has been a shift towards an “experimentalist” view of empirical work

As in the experimental literature main focus is on identifying a causal or “treatment” effect

We want to measure the effect of a variableX on an outcomeY; for example, the effect of an additional year of education on earnings

In the lab we can use random assignment and observe behavioral responses

In the real world random assignment is unlikely

Education is not randomly assigned: X varies with unobserved characteristicsU that also affect earnings (like innate ability)

(6)

Random assignment and natural experiments

Identification of causal effect requires finding a source of exogenous variation

We need an instrumental variableZ correlated withX but uncorrelated with the unobservables that also affect earnings

The ideal instrument generates random assignment: Those with

Z = 1 tend do chose higherX (all other things equal) than those with Z = 0

Natural experiment: An exogenous event affects a random subset of the population

The natural experiment induces at least some members of the “treatment group” to chose (or be assigned) a higher level ofX than otherwise

(7)

Identification

In fact there exists few truly natural experiments (“natural natural” experiments), for example:

Weather events

Realizations of child gender Twin births

For any other “natural” experiment, we effectively need an argument to motivate random assignment of the treatment

Typically this involves 2-3 paragraphs of text

And a lot of the credibility of the paper depends on how well this part is drafted

(8)

Role of theory

When one finds an exogenous source of information (or can

successfully argue that it is), there is no need toexplain the results

Even if unintuitive it is very difficult for readers/referees/audiences to argue against your findings

(see Dell, Ecta 2010, for a nice exception)

And in fact it is claimed to be an advantage of the experimentalist approach

Results do not (a priori) rely on economic theory

No (explicit) assumptions about how economic agents chose education

X, or how unobserved heterogeneityUis generated

However in most cases it is not possible to learn anything of interest from data without a theory

(9)

Classic example: Lottery draft

Uses the Vietnam war draft lottery as a randomization tool to study the effect of military service on lifetime earnings

Draft numbers (1 to 365) were randomly assigned and affected the probability of treatment (military service)

All men below an exogenously determined threshold were enrolled in the army (after various tests)

(10)

Lottery draft: Implicit assumptions

Results: military service causes a decrease of about 15% in annual earning

Veterans did not simply have lower earnings because they tended to have lower values of the error termU to begin with

What does it mean? What is the mechanism?

Suppose wages depend on education, private sector work experience, and military work experience

We need to assume:

Completed schooling is uncorrelated with draft lottery number (which seems implausible as the draft interrupts schooling)

Private sector experience is determined mechanically as age minus years of military service minus years of school; otherwise instrument

correlated with experience

(11)

Summary

Key question for any empiricist has to answer: What is the exogenous source of variation / identifying assumption?

Lottery number uncorrelated with individual characteristics Impacts labor market outcomes only through probability of veteran status

Validity requires an interpretation (a model) of the real world situation

(12)

Structural vs. atheoretical approaches

Exogeneity assumptions are always a priori: Need an economic model

If economic mechanism is left implicit, interpretation of the results is difficult: may capture different countervailing forces

From a policy perspective, interpretation should be more important than identification

In a structural approach, parameters have a direct economic interpretation within a behavioral model

(13)

(14)

Random utility model: Motivation

Objective: Estimate the parameters of a utility function representing the preferences of a representative consumer

Can be used to simulate choices under alternative conditions (e.g. evaluate the change in demand when a policy is introduced)

Evaluate welfare impact associated with providing new goods (e.g. see Petrin, JPE 2002)

Recall: standard utility maximization problem imply that (inverse) demand for goodx is the marginal rate of substitution between x and income y

Loosely speaking: How much extra income makes consumers indifferent between having the good or not

(15)

Random utility model: Setup

Consider the demand for differentiated products (cars, cereals brands, light bulbs, ...)

Consumersi = 1, ...,I buy one ofj= 1, ..J alternatives

Outside good (j = 0), bought in quantityzi (price normalized to 1)

Note choice may be repeated (“choice-occasion”)

Each product is described by aL-dimensional vector of observed characteristicsxj (e.g. for cars: the make, body type, engine size,

MPG, ...) and its pricepj

Consumer’s problem is given by

max

(16)

Random utility model: Optimization

Conditional (on buying alternativesj) indirect utility function is given by:

Uij(xj,pj,yi) =Ui(xj,yi−pj)

If outside good is bought: Ui0(xj,pj,yi) =Ui(0,yi)

Utility maximizing behavior implies that consumersi selects option j

ifUij(xj,pj,yi)≥Uik(xk,pk,yi) for allk 6=j

Inverse demand for each characteristics can be evaluated individually:

(17)

Random utility model

In practice we do not observe all the determinants of choices, so decompose utility of productj as:

Uij(xj,pj,yi) =Vij(xj,pj,yi) +εij

Vij is the deterministic component of utility, known by the researcher

and the decision-maker

εij is a stochastic component (error term), which is known only by the

decision-maker

From the researcher’s point of view choices are random (they depend on the error term)

Although individuals know their preferences with certainty This structure is known as the random utility model (in 2000 McFadden won the Nobel Prize for this)

The error term has a structural interpretation: as it is part of the utility function and known to the decision maker

(18)

Random utility model: Choice probabilities

The probability that consumer i chooses alternativej is:

Probij = Prob[Uij(xj,pj,yi)≥Uik(xk,pk,yi),∀k 6=j]

= Prob[Vij(xj,pj,yi) +εij ≥Vik(xk,pk,yi) +εik,∀k6=j]

= Prob[Vij(xj,pj,yi)−Vik(xk,pk,yi)≥εik −εij,∀k6=j]

= Prob[εik−j ≤Vij(xj,pj,yi)−Vik(xk,pk,yi),∀k 6=j]

=

Vij−Vi1 Z

−∞

Vij−Vi2 Z

−∞

...

Vij−ViJ Z

−∞

f(εik−j)dεik−j

Conditional on the observed part of utility Vij and the distribution of

theJ−1 vector of error differencesεik−j, we can evaluate choice

(19)

Random utility model: Basic implementation

In its simplest (and most restrictive) specification,Vij is given by:

Vij =β1x1j +...+βLxLj+γpj

Theβ’s andγ are parameters to be estimated

Only difference in utilities matter, so characteristics that do not vary across alternative (e.g. income) drop out

Note that: MWTPl =βl/γ

It is also conveniently assumed that the εij are iid and follow a

Gumbel distribution

Close to the normal distribution but with nice closed form expressions Implies that error differencesεik−j have a logistic distribution, with

cumulative distribution functionF(x) = ₁₊x_x

Gives rise to the simple multinomial logit model; alternative treatments of unobserved heterogeneity generate other (often less restrictive) models

(20)

Gumbel distribution (aka type I extreme value distribution)

(21)

Random utility model: Estimation

Given the above distributional assumption choice probabilities take a closed form:

Probij =

exp[Vij(xj,pj,yi)] J

P

k=0

exp[Vik(xk,pk,yi)]

= exp[β1x1j +...+βLxLj +γpj]

J

P

k=0

exp[β1x1k +...+βLxLk +γpk]

The probability (likelihood) of observing a set of i choices:

`(β1, ..., βL, γ) =Q i

Q j

Probij(β1, ..., βL, γ)dij

wheredij are indicator variables equal to one ifi chooses optionj, zero

otherwise

Parameters of the utility function are the solution to maximizing the log of the expression above

(22)

(23)

Application: Residential location choices

Bayer, Keohane, Timmins (JEEM, 2009) “Migration and hedonic valuation: The case of air quality”

Hedonic approach to the valuation of local amenities: Different locations are bundles of different characteristics

Households “vote with their feet” (Tiebout, 1956)

(24)

A model of location choice

Individuals simultaneously choose their location along with consumptionCi of the numeraire good and non-traded goodHi

(“housing”)

Locations j are characterized by location-specific amenityXj (“air

quality”)

There is a moving cost Mj associated with settling inj, and income is

location-specific and denoted by Yj

max

C,H,Xj

U(C,H;Xj,Mj) s.t. Yj = C +ρjH

In equilibrium, market prices adjust and individuals are indifferent among locations (otherwise they move)

(25)

Indirect utility function

Utility of individuali for location j:

Uij =C_iβCH_iβHX_jβXeMij+ξj+εij

whereξj represent unobserved attributes of locationj andεij is iid

Gumbel

Substitute demand for housingH_ij∗ = βH βH+βC

Iij

ρj and the budget

constraint to get indirect utility function

Vij =I_ijβIeMij−βHln(ρj)+βXln(Xj)+ξj+εij

with βI =βH+βC

(26)

Specification

Rewrite indirect utility as:

lnVij =βIlnIij +Mij +θj +εij

Moving costs: Mij=µSdijS+µR1dijR1+µR2dijR2

Note that income ofi in different location has to be estimated separately (it is not observed)

Probij =

exp[βIlnIij+µSdijS+µR1dijR1+µR2dijR2+θj] J

P

k=0

exp[βIlnIik+µSdikS+µR1dikR1+µR2dikR2+θk]

Estimated location fixed effects: θj =−βHln(ρj) +βXln(Xj)

Represent indirect utility (“quality of life”) of each location RecoverβX through linear regression of the fixed effects onXj and

(27)

Identification

Two issues in estimatingθj =−βHln(ρj) +βXln(Xj) +ξj 1. ρj (price of housing) correlated with ξj (the error term)

Instead useθj+βHln(ρj) as dependent variable (with

βH =βI(ρj∗Hi∗/Iij) = 0.2)

Call this “housing price adjusted quality of life”

2. Xj (air pollution) correlated withξj

Use first difference (1990-2000) to remove long-run association InstrumentXj with pollution emitted from sources at least 80km from

(28)

Data

Census data (1990-2000) for household heads under 35 and reside in one of 242 metropolitan statistical areas

Migration: Birth vs. current residence

Local economic activities and amenities aggregated up from county-level

(29)

Results: First step

`(µS, µR1, µR2, βI, θ) =

Q

t

Q

i

Q

j



 

exp[βIlnIijt+µSdijtS+µR1dijtR1+µR2dijtR2+θjt] J

P

k=0

exp[βIlnIikt+µSdiktS +µR1diktR1+µR2diktR2+θkt] 

 

(30)

Results: Second step step

IV estimation: ∆θj + 0.2∆ lnρj =βPM∆ lnPMj +βZ∆Zj +ξj

(31)

(32)

Motivation

We have considered a simple one-shot optimization decision

Choices are independent from each other: Utility only depends on attributes of that particular choice

Now consider choices that have an influence on options available in the future

A decision-maker may take these future effects into consideration

If decision-maker is forward looking the objective is to maximize the stream of instantaneous utility associated with choices, given:

1. Information currently available (usually summarized in a vector of stock variables)

2. Knowledge that he will act optimally when information is revealed in the future

As before the objective function is only partly known to the analyst, so the model rationalizes observed choices only up to some

(33)

Concrete example: Bus-engine replacement

Harold Zurcher (HZ) bus engine problem (Rust, Ecta 1987)

For a given bus, we observe a (finite) sequence of choice{dt}Tt=0 and a

sequence of mileage{xt}Tt=0

Each bus is treated as an independent observation, so we will just study the choice problem for one bus

Standard and general framework for the solution to dynamic discrete choice models

Simple conceptual problem, estimation framework widely applied in many different settings

Observe a sequence of (discrete) choices and a sequence of state variables describing the information available to the decision-maker Objective: infer structural parameters of the objective function and stochastic process whose associated optimal strategy coincides with the data

(34)

Setup

Every month t, HZ decides whether to replace the bus engine (dt = 1) or not (dt = 0)

Ifdt = 0, incurs maintenance costc(xt;θ1), increasing inxt (observed

state variable);θ1is a cost parameter to be estimated

v(dt = 0,xt;θ1,RC) =−c(xt;θ1)

Ifdt = 1, incurs engine replacement costRC, to be estimated; implies

xt+1 = 0

v(dt = 1,xt;θ1,RC) =−RC

Per period “utility”: u(dt,xt, εt;θ1,RC) =v(dt,xt;θ1,RC) +εt

whereεt = (ε0t, ε1t) is a structural error term (observed by the

decision-maker but unobserved to the econometrician)

After a decision is done state variables (xt, εt) evolve stochastically

(35)

Optimization problem

The sequence of decisions {dt}∞t=0 maximizes:

W =E[

∞

P

t=0

δtu(dt,xt, εt;θ1,RC)]

Stationary infinite horizon problem withδ∈(0,1)

Bellman equation:

V(xt, εt) = max dt=0,1

{v(dt,xt;θ1,RC) +εt+δEV(dt,xt, εt)}

where we defined

EV(dt,xt, εt) ≡ Ext+1,εt+1[V(xt+1, εt+1)|dt,xt, εt]

=

Z

xt+1 Z

εt+1

(36)

Simplifying assumptions

A1. Conditional independence:

p(xt+1, εt+1|xt, εt,dt) = p(xt+1|xt,dt)·p(εt+1|xt+1) 1. Givenxt anddt,xt+1is independent ofεt

2. Givenxt,εt+1 is independent ofεt

A2. εij follows a Gumbel distribution

Implication: EV(·) simplifies to

EV(dt,xt) =

Z

xt+1

Z

εt+1

V(xt+1, εt+1)p(dxt+1|xt,dt)p(dεt+1|xt+1)

= Z xt+1 Z εt+1 max dt+1=0,1

{v(dt+1,xt+1) +εt+1+δEV(dt+1,xt+1)}p(dεt+1|xt+1)p(dxt+1|xt,dt)

= Z xt+1 ln    X

dt+1=0,1

exp[v(dt+1,xt+1) +δEV(dt+1,xt+1]







(37)

Choice probabilities

The probability of observing an engine replacement is:

Prob(dt = 1|xt) = Prob[u(dt= 1,xt, εt;θ1,RC)≥u(dt= 0,xt, εt;θ1,RC)]

= Prob[−RC+ε1t+δEV(dt= 1,xt)≥ −c(xt;θ1) +ε0t+δEV(dt = 0,xt)]

= exp[−RC+δEV(dt= 1,xt)]

exp[−RC+δEV(dt= 1,xt)] + exp[−c(xt;θ1) +δEV(dt= 0,xt)]

(38)

Likelihood and estimation

Given the conditional independence assumption, the likelihood of observing a sequence {(dt,xt)}Tt=0 as a function of the vector of

parameters to be estimated (θ) is:

`(θ) =

T

Y

t=0

Prob[dt,xt|dt−1,xt−1;θ]

=

T

Y

t=0

Prob[dt|xt;θ1,RC]·Prob[xt|dt−1,xt−1;θ2]

There are different procedures to estimate the parameters (see papers by Aguirregabiria and Mira; and in particular Su and Judd, Ecta 2012)

Here we summarize the original procedure in Rust (1987)

Estimation proceeds in two steps:

(39)

Estimation procedure

We first discretize the state space, so that incremental mileage each month can fall in three ranges:

Between 0 and 5k with probabilityθ21

Between 5k and 10k with probabilityθ22

More than 10k with probability 1−θ21−θ22

This is a simple parametric maximum likelihood problem

The second step employs a “nested fixed point algorithm” to estimate θ1 and RC:

“Inner loop”: ComputesEV(·) by solving the forward looking problem, taking estimates of ˆθ1,RCˆ as given

“Outer loop”: GivenEV(·), searches for the value ofθ1,RC that

(40)

Inner loop: Computation

The inner loop requires solving the following fixed-point equation by successive iteration (indexed by τ):

EVτ+1(dt,xt) =

Z xt+1 ln    X

dt+1=0,1

exp[v(dt+1,xt+1) +δEV(dt+1,xt+1]







p(dxt+1|xt,dt) =

ˆ

θ21 xt+5000

Z xt ln    X

dt+1=0,1

exp[v(dt+1,xt+1; ˆθ1,RCˆ ) +δEVτ(dt+1,xt+1]







dxt+1+

ˆ

θ22 xt+10000

Z

xt+5000 ln





 X

dt+1=0,1







dxt+1+

(1−θˆ21−θˆ22)

∞ Z

xt+10000 ln

 

 X

dt+1=0,1

 



dxt+1

Iteration stops when some measure of the distance betweenEVτ and

(41)

Last slide

Simple setup provides a very rich framework to analyze discrete decisions

For a recent application see Muehlenbachs (2015) “A dynamic model of cleanup: Estimating sunk costs in oil and gas production”, International Economic Review

In principle identification should be discussed as usual: How does variation in the data pins down the parameters of interest?

Very clear discussion in Bayer et al. (2009)

With dynamic models this can be complicated