Doctoral seminar in resource economics
Structural estimation and applications
to natural resources
Bruno Lanz
Graduate Institute of International and Development Studies
Outline for the class
Structural vs. atheoretical approaches to estimation:
Example from Keane (J Metrics, 2010): Lottery draft and lifetime earnings
Structural estimation: Discrete choice
Theory: Estimation of random utility models
Application: Spatial sorting (Bayer et al., JEEM 2009)
Estimation of dynamic discrete choice models
Basic framework: Classic paper by Rust (Ecta 1987) Policy-orientated applications (next week):
Land use choices (Scott, 2014)
What is structural estimation?
Start from a standard model: Optimizing agents with fixed preferences / technology
Objective: Estimate the structural parameters (primitives) of the model
Rationalize the data as the model’s outcome
“As if” the data had been generated by the model
A structural error term allows real-world observations to deviate from optimal behavior determined by the model’s solution
Why do we care?
Some behavioral parameters may be of interest in their own right (e.g. an elasticity of substitution of a CES production function)
But what makes the approach attractive is the ability to exploit the estimated model
Once the parameters of the model are estimated the model can be solved to make behavioral predictions and study counterfactual equilibria
Either out of sample (no change in the structure of the problem) Or introduce a new component to the problem (e.g. a change in relative prices) and see how equilibrium differs from observed outcomes
In some settings it can be used to assess the welfare impact of policy interventions
Experimental view of estimation
With the rise of controlled (laboratory) experiments in the late 80s, there has been a shift towards an “experimentalist” view of empirical work
As in the experimental literature main focus is on identifying a causal or “treatment” effect
We want to measure the effect of a variableX on an outcomeY; for example, the effect of an additional year of education on earnings
In the lab we can use random assignment and observe behavioral responses
In the real world random assignment is unlikely
Education is not randomly assigned: X varies with unobserved characteristicsU that also affect earnings (like innate ability)
Random assignment and natural experiments
Identification of causal effect requires finding a source of exogenous variation
We need an instrumental variableZ correlated withX but uncorrelated with the unobservables that also affect earnings
The ideal instrument generates random assignment: Those with
Z = 1 tend do chose higherX (all other things equal) than those with Z = 0
Natural experiment: An exogenous event affects a random subset of the population
The natural experiment induces at least some members of the “treatment group” to chose (or be assigned) a higher level ofX than otherwise
Identification
In fact there exists few truly natural experiments (“natural natural” experiments), for example:
Weather events
Realizations of child gender Twin births
For any other “natural” experiment, we effectively need an argument to motivate random assignment of the treatment
Typically this involves 2-3 paragraphs of text
And a lot of the credibility of the paper depends on how well this part is drafted
Role of theory
When one finds an exogenous source of information (or can
successfully argue that it is), there is no need toexplain the results
Even if unintuitive it is very difficult for readers/referees/audiences to argue against your findings
(see Dell, Ecta 2010, for a nice exception)
And in fact it is claimed to be an advantage of the experimentalist approach
Results do not (a priori) rely on economic theory
No (explicit) assumptions about how economic agents chose education
X, or how unobserved heterogeneityUis generated
However in most cases it is not possible to learn anything of interest from data without a theory
Classic example: Lottery draft
Uses the Vietnam war draft lottery as a randomization tool to study the effect of military service on lifetime earnings
Draft numbers (1 to 365) were randomly assigned and affected the probability of treatment (military service)
All men below an exogenously determined threshold were enrolled in the army (after various tests)
Lottery draft: Implicit assumptions
Results: military service causes a decrease of about 15% in annual earning
Veterans did not simply have lower earnings because they tended to have lower values of the error termU to begin with
What does it mean? What is the mechanism?
Suppose wages depend on education, private sector work experience, and military work experience
We need to assume:
Completed schooling is uncorrelated with draft lottery number (which seems implausible as the draft interrupts schooling)
Private sector experience is determined mechanically as age minus years of military service minus years of school; otherwise instrument
correlated with experience
Summary
Key question for any empiricist has to answer: What is the exogenous source of variation / identifying assumption?
Lottery number uncorrelated with individual characteristics Impacts labor market outcomes only through probability of veteran status
Validity requires an interpretation (a model) of the real world situation
Structural vs. atheoretical approaches
Exogeneity assumptions are always a priori: Need an economic model
If economic mechanism is left implicit, interpretation of the results is difficult: may capture different countervailing forces
From a policy perspective, interpretation should be more important than identification
In a structural approach, parameters have a direct economic interpretation within a behavioral model
Random utility model: Motivation
Objective: Estimate the parameters of a utility function representing the preferences of a representative consumer
Can be used to simulate choices under alternative conditions (e.g. evaluate the change in demand when a policy is introduced)
Evaluate welfare impact associated with providing new goods (e.g. see Petrin, JPE 2002)
Recall: standard utility maximization problem imply that (inverse) demand for goodx is the marginal rate of substitution between x and income y
Loosely speaking: How much extra income makes consumers indifferent between having the good or not
Random utility model: Setup
Consider the demand for differentiated products (cars, cereals brands, light bulbs, ...)
Consumersi = 1, ...,I buy one ofj= 1, ..J alternatives
Outside good (j = 0), bought in quantityzi (price normalized to 1)
Note choice may be repeated (“choice-occasion”)
Each product is described by aL-dimensional vector of observed characteristicsxj (e.g. for cars: the make, body type, engine size,
MPG, ...) and its pricepj
Consumer’s problem is given by
max
Random utility model: Optimization
Conditional (on buying alternativesj) indirect utility function is given by:
Uij(xj,pj,yi) =Ui(xj,yi−pj)
If outside good is bought: Ui0(xj,pj,yi) =Ui(0,yi)
Utility maximizing behavior implies that consumersi selects option j
ifUij(xj,pj,yi)≥Uik(xk,pk,yi) for allk 6=j
Inverse demand for each characteristics can be evaluated individually:
Random utility model
In practice we do not observe all the determinants of choices, so decompose utility of productj as:
Uij(xj,pj,yi) =Vij(xj,pj,yi) +εij
Vij is the deterministic component of utility, known by the researcher
and the decision-maker
εij is a stochastic component (error term), which is known only by the
decision-maker
From the researcher’s point of view choices are random (they depend on the error term)
Although individuals know their preferences with certainty This structure is known as the random utility model (in 2000 McFadden won the Nobel Prize for this)
The error term has a structural interpretation: as it is part of the utility function and known to the decision maker
Random utility model: Choice probabilities
The probability that consumer i chooses alternativej is:
Probij = Prob[Uij(xj,pj,yi)≥Uik(xk,pk,yi),∀k 6=j]
= Prob[Vij(xj,pj,yi) +εij ≥Vik(xk,pk,yi) +εik,∀k6=j]
= Prob[Vij(xj,pj,yi)−Vik(xk,pk,yi)≥εik −εij,∀k6=j]
= Prob[εik−j ≤Vij(xj,pj,yi)−Vik(xk,pk,yi),∀k 6=j]
=
Vij−Vi1 Z
−∞
Vij−Vi2 Z
−∞
...
Vij−ViJ Z
−∞
f(εik−j)dεik−j
Conditional on the observed part of utility Vij and the distribution of
theJ−1 vector of error differencesεik−j, we can evaluate choice
Random utility model: Basic implementation
In its simplest (and most restrictive) specification,Vij is given by:
Vij =β1x1j +...+βLxLj+γpj
Theβ’s andγ are parameters to be estimated
Only difference in utilities matter, so characteristics that do not vary across alternative (e.g. income) drop out
Note that: MWTPl =βl/γ
It is also conveniently assumed that the εij are iid and follow a
Gumbel distribution
Close to the normal distribution but with nice closed form expressions Implies that error differencesεik−j have a logistic distribution, with
cumulative distribution functionF(x) = 1+xx
Gives rise to the simple multinomial logit model; alternative treatments of unobserved heterogeneity generate other (often less restrictive) models
Gumbel distribution (aka type I extreme value distribution)
Random utility model: Estimation
Given the above distributional assumption choice probabilities take a closed form:
Probij =
exp[Vij(xj,pj,yi)] J
P
k=0
exp[Vik(xk,pk,yi)]
= exp[β1x1j +...+βLxLj +γpj]
J
P
k=0
exp[β1x1k +...+βLxLk +γpk]
The probability (likelihood) of observing a set of i choices:
`(β1, ..., βL, γ) =Q i
Q j
Probij(β1, ..., βL, γ)dij
wheredij are indicator variables equal to one ifi chooses optionj, zero
otherwise
Parameters of the utility function are the solution to maximizing the log of the expression above
Application: Residential location choices
Bayer, Keohane, Timmins (JEEM, 2009) “Migration and hedonic valuation: The case of air quality”
Hedonic approach to the valuation of local amenities: Different locations are bundles of different characteristics
Households “vote with their feet” (Tiebout, 1956)
A model of location choice
Individuals simultaneously choose their location along with consumptionCi of the numeraire good and non-traded goodHi
(“housing”)
Locations j are characterized by location-specific amenityXj (“air
quality”)
There is a moving cost Mj associated with settling inj, and income is
location-specific and denoted by Yj
max
C,H,Xj
U(C,H;Xj,Mj) s.t. Yj = C +ρjH
In equilibrium, market prices adjust and individuals are indifferent among locations (otherwise they move)
Indirect utility function
Utility of individuali for location j:
Uij =CiβCHiβHXjβXeMij+ξj+εij
whereξj represent unobserved attributes of locationj andεij is iid
Gumbel
Substitute demand for housingHij∗ = βH βH+βC
Iij
ρj and the budget
constraint to get indirect utility function
Vij =IijβIeMij−βHln(ρj)+βXln(Xj)+ξj+εij
with βI =βH+βC
Specification
Rewrite indirect utility as:
lnVij =βIlnIij +Mij +θj +εij
Moving costs: Mij=µSdijS+µR1dijR1+µR2dijR2
Note that income ofi in different location has to be estimated separately (it is not observed)
Probij =
exp[βIlnIij+µSdijS+µR1dijR1+µR2dijR2+θj] J
P
k=0
exp[βIlnIik+µSdikS+µR1dikR1+µR2dikR2+θk]
Estimated location fixed effects: θj =−βHln(ρj) +βXln(Xj)
Represent indirect utility (“quality of life”) of each location RecoverβX through linear regression of the fixed effects onXj and
Identification
Two issues in estimatingθj =−βHln(ρj) +βXln(Xj) +ξj 1. ρj (price of housing) correlated with ξj (the error term)
Instead useθj+βHln(ρj) as dependent variable (with
βH =βI(ρj∗Hi∗/Iij) = 0.2)
Call this “housing price adjusted quality of life”
2. Xj (air pollution) correlated withξj
Use first difference (1990-2000) to remove long-run association InstrumentXj with pollution emitted from sources at least 80km from
Data
Census data (1990-2000) for household heads under 35 and reside in one of 242 metropolitan statistical areas
Migration: Birth vs. current residence
Local economic activities and amenities aggregated up from county-level
Results: First step
`(µS, µR1, µR2, βI, θ) =
Q
t
Q
i
Q
j
exp[βIlnIijt+µSdijtS+µR1dijtR1+µR2dijtR2+θjt] J
P
k=0
exp[βIlnIikt+µSdiktS +µR1diktR1+µR2diktR2+θkt]
Results: Second step step
IV estimation: ∆θj + 0.2∆ lnρj =βPM∆ lnPMj +βZ∆Zj +ξj
Motivation
We have considered a simple one-shot optimization decision
Choices are independent from each other: Utility only depends on attributes of that particular choice
Now consider choices that have an influence on options available in the future
A decision-maker may take these future effects into consideration
If decision-maker is forward looking the objective is to maximize the stream of instantaneous utility associated with choices, given:
1. Information currently available (usually summarized in a vector of stock variables)
2. Knowledge that he will act optimally when information is revealed in the future
As before the objective function is only partly known to the analyst, so the model rationalizes observed choices only up to some
Concrete example: Bus-engine replacement
Harold Zurcher (HZ) bus engine problem (Rust, Ecta 1987)
For a given bus, we observe a (finite) sequence of choice{dt}Tt=0 and a
sequence of mileage{xt}Tt=0
Each bus is treated as an independent observation, so we will just study the choice problem for one bus
Standard and general framework for the solution to dynamic discrete choice models
Simple conceptual problem, estimation framework widely applied in many different settings
Observe a sequence of (discrete) choices and a sequence of state variables describing the information available to the decision-maker Objective: infer structural parameters of the objective function and stochastic process whose associated optimal strategy coincides with the data
Setup
Every month t, HZ decides whether to replace the bus engine (dt = 1) or not (dt = 0)
Ifdt = 0, incurs maintenance costc(xt;θ1), increasing inxt (observed
state variable);θ1is a cost parameter to be estimated
v(dt = 0,xt;θ1,RC) =−c(xt;θ1)
Ifdt = 1, incurs engine replacement costRC, to be estimated; implies
xt+1 = 0
v(dt = 1,xt;θ1,RC) =−RC
Per period “utility”: u(dt,xt, εt;θ1,RC) =v(dt,xt;θ1,RC) +εt
whereεt = (ε0t, ε1t) is a structural error term (observed by the
decision-maker but unobserved to the econometrician)
After a decision is done state variables (xt, εt) evolve stochastically
Optimization problem
The sequence of decisions {dt}∞t=0 maximizes:
W =E[
∞
P
t=0
δtu(dt,xt, εt;θ1,RC)]
Stationary infinite horizon problem withδ∈(0,1)
Bellman equation:
V(xt, εt) = max dt=0,1
{v(dt,xt;θ1,RC) +εt+δEV(dt,xt, εt)}
where we defined
EV(dt,xt, εt) ≡ Ext+1,εt+1[V(xt+1, εt+1)|dt,xt, εt]
=
Z
xt+1 Z
εt+1
Simplifying assumptions
A1. Conditional independence:
p(xt+1, εt+1|xt, εt,dt) = p(xt+1|xt,dt)·p(εt+1|xt+1) 1. Givenxt anddt,xt+1is independent ofεt
2. Givenxt,εt+1 is independent ofεt
A2. εij follows a Gumbel distribution
Implication: EV(·) simplifies to
EV(dt,xt) =
Z
xt+1
Z
εt+1
V(xt+1, εt+1)p(dxt+1|xt,dt)p(dεt+1|xt+1)
= Z xt+1 Z εt+1 max dt+1=0,1
{v(dt+1,xt+1) +εt+1+δEV(dt+1,xt+1)}p(dεt+1|xt+1)p(dxt+1|xt,dt)
= Z xt+1 ln X
dt+1=0,1
exp[v(dt+1,xt+1) +δEV(dt+1,xt+1]
Choice probabilities
The probability of observing an engine replacement is:
Prob(dt = 1|xt) = Prob[u(dt= 1,xt, εt;θ1,RC)≥u(dt= 0,xt, εt;θ1,RC)]
= Prob[−RC+ε1t+δEV(dt= 1,xt)≥ −c(xt;θ1) +ε0t+δEV(dt = 0,xt)]
= exp[−RC+δEV(dt= 1,xt)]
exp[−RC+δEV(dt= 1,xt)] + exp[−c(xt;θ1) +δEV(dt= 0,xt)]
Likelihood and estimation
Given the conditional independence assumption, the likelihood of observing a sequence {(dt,xt)}Tt=0 as a function of the vector of
parameters to be estimated (θ) is:
`(θ) =
T
Y
t=0
Prob[dt,xt|dt−1,xt−1;θ]
=
T
Y
t=0
Prob[dt|xt;θ1,RC]·Prob[xt|dt−1,xt−1;θ2]
There are different procedures to estimate the parameters (see papers by Aguirregabiria and Mira; and in particular Su and Judd, Ecta 2012)
Here we summarize the original procedure in Rust (1987)
Estimation proceeds in two steps:
Estimation procedure
We first discretize the state space, so that incremental mileage each month can fall in three ranges:
Between 0 and 5k with probabilityθ21
Between 5k and 10k with probabilityθ22
More than 10k with probability 1−θ21−θ22
This is a simple parametric maximum likelihood problem
The second step employs a “nested fixed point algorithm” to estimate θ1 and RC:
“Inner loop”: ComputesEV(·) by solving the forward looking problem, taking estimates of ˆθ1,RCˆ as given
“Outer loop”: GivenEV(·), searches for the value ofθ1,RC that
Inner loop: Computation
The inner loop requires solving the following fixed-point equation by successive iteration (indexed by τ):
EVτ+1(dt,xt) =
Z xt+1 ln X
dt+1=0,1
exp[v(dt+1,xt+1) +δEV(dt+1,xt+1]
p(dxt+1|xt,dt) =
ˆ
θ21 xt+5000
Z xt ln X
dt+1=0,1
exp[v(dt+1,xt+1; ˆθ1,RCˆ ) +δEVτ(dt+1,xt+1]
dxt+1+
ˆ
θ22 xt+10000
Z
xt+5000 ln
X
dt+1=0,1
exp[v(dt+1,xt+1; ˆθ1,RCˆ ) +δEVτ(dt+1,xt+1]
dxt+1+
(1−θˆ21−θˆ22)
∞ Z
xt+10000 ln
X
dt+1=0,1
exp[v(dt+1,xt+1; ˆθ1,RCˆ ) +δEVτ(dt+1,xt+1]
dxt+1
Iteration stops when some measure of the distance betweenEVτ and
Last slide
Simple setup provides a very rich framework to analyze discrete decisions
For a recent application see Muehlenbachs (2015) “A dynamic model of cleanup: Estimating sunk costs in oil and gas production”, International Economic Review
In principle identification should be discussed as usual: How does variation in the data pins down the parameters of interest?
Very clear discussion in Bayer et al. (2009)
With dynamic models this can be complicated