• No results found

As inLee and Fong (2013), consider one market only for notational simplicity4.

The first part of the algorithm below describes the generation of value functions via forward simulation.

I assume that f is distributed Type I Extreme Value. Following Hotz and Miller

(1993), if the probability of player i choosing each action under each state, Pi(ai|g)

(the conditional choice probabilities), can be estimated from the data, differences in the choice specific value functions,viσ(ai, g) in equation3.8 in the previous chapter, can be

recovered as:

vi(a, g)−vi(a0, g) =ln(Pi(a|g))−ln(Pi(a0|g)) (4.1)

for any two actionsa and a0. Then, the estimated policy function for agent iwould be given by:

ˆ

σi(g, i) =argmaxa∈Ai{vi(a, g) +a,i}=argmaxa∈Ai{ln(Pi(a|g)) +a,i} (4.2)

For any set of policy functions{σi, σ−i}consistent estimates of value functions for agent

iand all rivals playing those strategies, can be obtained using:

ˆ Vi(g, σ, θ) =E ∞ X t βit(πi(gt,t)−c(gt|gt−1)+ti,σi(gt−1,t i)) |g0 =g, gt=O(˜g(σ(gt−1, t))), θ (4.3)

4Note, however, that for the estimators to have desirable properties, all the states in a certain recurrent

class need to be visited infinitely often. In the context of our panel, where a number of markets are observed over time, this would require (for example) assuming that the initial state in each market is a drawn from the ergodic steady state distribution of states.

Expectations are taken over present and future’s, theO rules andtare consistent with ˆ

V, following the Nash bargaining procedure.

Let θ be the vector of parameters. The approximation of these value functions is done iteratively using the forward simulation inBajari et al.(2007), following the steps below5:

1. Fixθ.

2. Sett and O(·), in the first ‘round’ of the process to be an initial guess. 3. For eachg, iterate to generate Vi(g;σ;tk, Ok, θ), where in each iteratiion τ:.

(a) Setgτ =g, the network fixed above.

(b) For each agenti, draw error shocksτi for each action that can be taken. (c) For each agent i, calculate actions aτ

i = ˆσi(gτ, τi), as the profit maximising

action.

(d) Using theai’s for all players obtain the negotiation network ˜g(aτ).

(e) UsingO(·), obtain the stable network that arises from the predicted negoti- ation network, g0 =O(˜g(aτ)).

(f) For eachicompute the stage profits πi(g0,t)−ci(˜g(aτ)|g0) +τai,i.

(g) Update the network to begτ+1=g0and repeat steps 3.ato 3.gup toT times. This constitutes one path of play.

(h) Generate multiple paths of plays following the steps above, starting with networkg in the first iteration.

(i) Average eachi’s discounted stream of payoffs for the multiple simulated paths of play to obtain an estimate of ˆVi(g;σ;t;Oτ, θ).

4. Repeat step 3 for all the possible states of the world6.

5. Update tk+1 and Ok+1 using the ˆVi estimated for each i and g, solving the bar-

gaining problem.

6. Usetk+1 and Ok+1 to re-start the process in step 2.

7. Repeat steps 2 to 5 until ˆVi(g;σ;tk;Ok, θ)−Vˆi(g;σ;tk−1;Ok−1, θ)< ω), whereω

is a specified cutoff.

5

Although the algorithm is presented differently the main structure corresponds to that inLee and Fong(2013).

6

The steps above, in our context can be modified to allow for the observation of prices. Step 5 computes prices in the Nash Bargaining problem for all the linked pairs under all possible states of the world, given θ. However, for the networks that are present in the data, prices can be directly observed and excluded from this step, restricting the computation of prices to the counterfactual instances (in disagreement points) and to unobserved states. This leads to two alternatives that will be explored in the following section.

The following part of the algorithm estimates policy functions and finds the parameters that minimise deviations from observed data.

1. Obtain equilibrium CCPs, ˆσi(g), non-parametrically from the data.

2. For each i, compute the optimal policy ˜σi(·;θ) given that all other players are

playing ˆσ−i following:

(a) Start with candidate policy ˜σiτ = ˆσi.

(b) For iteration τ let στ ={σ˜τi,σˆ−i}.

(c) For the probabilities implied inστ obtain simulated value functions ˆVi(g;σ;θ)

by running the forward simulation described above.

(d) Update conditional choice value functionsvσi(·) for all actions and states given ˆ

Vi(g; ˜σ;θ) and prices obtained after the forwards simulation.

(e) Update the CCPs for playeri, obtain ˜στ+1by: Piσ =exp(viσ(ai|g))/(Pa∈Aiexp(v

σ

i(a|g))).

(f) Repeat steps 2.ato 2.euntil thePiσ for alliand in all states under the optimal policy converge, up to a pre-specified threshold. Store the optimal policy of player i given that all other players are playing ˆσ−i as ˜σi(·;θ). As a result,

there is one of these per agent.

3. Obtain an estimate ofθby minimising the sum of squared deviations in the choice probabilities induced by ˜σi(·;θ) against the policy ˆσi obtained from the data:

ˆ θ=argminθ X g X i X a∈Ai P{˜σi,ˆσ−i} i (a|g)−Piσˆ(a|g) 2 (4.4)

The general algorithm above involves at least two aspects in which data limitations can induce bias.

The first one is the non parametric estimation of conditional choice probabilities from the data, for all players and actions and under each possible state of the world. In the

context of my problem, the state space is large even for a very small set of players. As established already in the stylised facts in Chapter 2 persistency in choice is high, implying that even in long panels, the probability of observing actions being taken in every state of the world are relatively low. The alternatives that I explore in the next section will involve (i) working with the ‘true’ CCPs, an advantage I have with simulated data, (ii) estimating the CCPs non-parametrical with a frequencies estimator and using kernel smoothing to estimate the CCPs in unobserved states, (iii) making the parametric assumption that unobserved states exhibit CCPs that correspond to the unconditional choice probabilities over all the observed states, for each player; (iv) making the parametric assumption that all actions in unobserved states can be played with equal probability.

Second, the minimum-distance score that is used for recovering the parameters compares element-wise and adds over every player, action and state. As suggested inBajari et al.

(2007) an alternative to explore could be to sum over the observed states only, at a risk of bias that depends on the application.