In this section, we firstly present the conditions under which the Poisson-Logit model is identified. One possible identification strategy is to use sign restrictions on the Logit process, henceforth called “reporting process”, which is helpful whenever prior information of the sign 1By “observational equivalence” we mean that there are two different linearly dependent sets of param-
25 of at least one parameter of the reporting process exists. Moreover, exclusion restrictions on the reporting or the count process can be used. This requires that we have initial information for at least one parameter of β or γ to be zero, meaning that a regressor that belongs to the reporting process has no effect on the reporting process, or the opposite.1 Following
another direction, we consider small departures from the Poisson-Logit model by assuming a different distribution for the count process, or a different model for the reporting process, or any combination of the two. We will also show that it is possible to identify θ by imposing a different structure on the density function. In this section only the theoretical results are presented. Discussion of these findings and empirical illustrations follow in subsections 1.6 and 1.8, respectively.
1.5.1
Sign Restrictions on the Reporting Process
A first way to identify θ is by imposing at least one sign restrictions on the reporting process. It must be stressed that this option is valid only if established theoretical results clearly suggest the direction of the impact of an independent variable on the reporting process. For instance, consider the example of labour mobility adopted by Winkelmann and Zimmermann (1993), where job offers follow the Poisson distribution and the probability to accept an offer is given as a Logit. Now suppose that a hypothetical “well” established theory for labour mobility suggests that more “firm specific” human capital accumulation (FS-HCA) by employees increases wage in the current job but not the wage offered by outside firms. Therefore, more FS-HCA increases the wages differential between the current job and potential outside job offers. Consequently, following this theory, an increase in FS-HCA will have a negative effect in the probability of a worker to accept a job offer, therefore, resulting in a negative coefficient in the Logit part.2
Since without exclusion restrictions two observationally equivalent models always exist with θ = (β, γ) and, θ∗ = (β + γ, −γ), the effect of this variable will be positive in the one 1If the model was not afflicted by this identification problem, it would be natural to assume that the
individual characteristics affecting the count process are the same with the individual characteristics deter- mining the reporting process. For example, assume that the decision to commit a crime depends on the gender, age and race. However, the probability of reporting this crime would be naturally affected by the same features. Therefore, it is a quite strong assumption to a priori restrict a coefficient to zero. There must always be rational reasons behind our choices.
26 model but negative in the second. Hence, identification is achieved since we reject the model in which the coefficient appears with the wrong sign. Finally, notice that sign restrictions on the count process are not appropriate as β and β∗ = β + γ can be of the same sign, so that the effect of a variable can possibly be of the same direction in both models.
1.5.2
Exclusion Restrictions on the Reporting Process
As mentioned at the beginning of this section, imposing exclusion restrictions can help identifying θ. However, it is easy to show that if exclusion restrictions are placed only on the Logit part, by restricting some elements of γ to zero, only the elements of β corresponding to the zeros in γ are identified.
Consider the case where we a priori know that at least one regressor belongs only to the count process. Differently, this can be considered as restricting the corresponding elements of γ to zero. When exclusion restrictions are placed on γ, vector x2i can be considered as
a subset of x1i, so that in vector x1i there is at least one variable that does not appear in
x2i. Assume that this set of regressors is denoted by wi. Thus, since vector x1i consists of
vector x2i plus vector wi, the exclusion restrictions on the Logit part could be thought of as
having added another set of regressors wi in the Poisson part, changing the Poisson-Logit
mean into µi = ex
0
2iβ+w0iηΛ
i, where η consists of the parameters corresponding to the zeros
in γ. Now, following the same reasoning as in (1.18) and (1.19) we have that,
µi ≡ ex 0 2iβ+w 0 iη e x02iγ 1 + ex02iγ = e x02i(β+γ)+w0iη e −x0 2iγ 1 + e−x0 2iγ = ex02iβ ∗+w0 iη ∗ ex 0 2iγ ∗ 1 + ex02iγ∗ ≡ µ ∗ i. (1.20)
Therefore, even in this case two observationally equivalent models exist where β∗ = β + γ and, γ∗ = −γ, but η∗ = η. It is clear that β and γ remain unidentified, since two different set of these parameters will lead to exactly the same likelihood value. In spite of this, all the elements included in η are identified, as η is identical in both µi and µ∗i. Hence, unless
27
1.5.3
Exclusion Restrictions on the Count Process
Now assume that exclusion restrictions are placed in the Poisson part, by setting some parameters of β to zero. Consequently, x2i consists of x1i plus a set of regressors that
corresponds to the excluded variables of the Poisson part. Let us denote this vector by qi. Therefore, the probability of reporting an event is now given by Λ(ex
0
1iγ+q0iϕ), where ϕ
contains the parameters in the reporting process corresponding to the restricted to zero parameters of β. Accordingly, we have:
µi ≡ ex 0 1iβ e x01iγ+qi0ϕ 1 + ex0 1iγ+q0iϕ = ex01i(β+γ)+q 0 iϕ e −x0 1iγ−q 0 iϕ 1 + e−x0 1iγ−q0iϕ 6= ex01iβ∗ ex 0 1iγ ∗+q0 iϕ ∗ 1 + ex0 1iγ∗+qi0ϕ∗ ≡ µ∗i, (1.21)
where β∗ = β + γ, γ∗ = −γ, and ϕ∗ = −ϕ. As we notice from (1.21), the two models µi and
µ∗i are not observationally equivalent in this case, since the vector qi appears in the Poisson
mean of µi but not in µ∗i, and identification for the whole model is achieved.
1.5.4
Specifying the Count Generating Process, as Negative Bi-
nomial 1
As mentioned before, models for count data that use the Negative Binomial distribution have been very popular as they allow for over-dispersion through the extra parameter α (or δ, in NB1 case). As presented in Section 1.3, allowing for gamma distributed unobserved heterogeneity in the Poisson-Logit model gives rise to the NB-Logit family of models. There, the two basic generalizations of the Poisson-Logit model were presented, the NB2-Logit and the NB1-Logit.
Concerning the NB2-Logit model, it is clear from (1.14) that its log likelihood depends on the regressors only through µi, as it is the case in the Poisson-Logit model. This is
because of the homoscedastic form of the variance of the gamma distributed error term α. As a consequence, identification of the NB2-Logit model requires exactly the same conditions established for the Poisson-Logit model.
On the other hand, according to the NB1-Logit model, the variance of the error term is heteroscedastic of the form δ/λi. This is incorporated into the log likelihood function
28 (1.17), where it now depends on x1i through λi, and on x2i through Λi, separately. Thus,
in a sense, the likelihood function of the NB1-Logit model can distinguish the count process from the reporting process, and consequently, β from γ. As a result, adopting the NB1 distribution, identification becomes possible even when both parts of the model contain the same regressors.
Nevertheless, it is very important to stress that NB1-Logit model is not a LEF and therefore, it is not robust in misspecifications of moments higher than the conditional mean. Therefore, since NB1-Logit MLE achieves identification of the mean by assuming a particular form of hetersoskedasticity of the error term, and consequently, by imposing a different structure on ωi, the estimates of θ will be inconsistent if the variance form is misspecified.
1.5.5
Specifying the Reporting Probability as a Probit or CLogLog
Another very popular model that deals with binary choice problems is the Probit model, which exhibits nearly the same properties as the Logit model (see, Maddala, 1983). Never- theless, assume that the correct specification for reporting a particular event is given by a Probit model instead of a Logit. According to the Probit model, Pr(Bij = 1|xi) = Φ(x02iγ),
where Φ(.) is the standard normal cumulative distribution function (CDF).
Given this assumption, the Poisson-Logit changes into the Poisson-Probit model with mean equal to µi = λiΦ(x02iγ). As opposed to the Logit, the functional form of the Probit
model cannot give rise to the identification problem described in Section 1.4, even when the regressors are the same in both parts of the model. This is obvious, since now µi =
ex0iβΦ(x0
iγ) 6= ex
0
i(β+γ)Φ(−x0
iγ). Therefore, when the probability of reporting an event is
given by a Probit, identification of the whole model is achieved.
Although less popular, the complementary log-log model (CLogLog) has also been used in the literature. Contrary to the Probit or Logit, this model assumes a non-symmetric CDF that is derived from the extreme value distribution. Therefore, according to this model Pr(Bij = 1|xi) = 1 − exp(−ex
0
iγ). As CLogLog relaxes the assumption of symmetry, it
becomes more appropriate in cases where the observed average probability of the outcome is close to one or close to zero. Therefore, if there are good reasons to believe that the probability of reporting a true event is very close to one or very close to zero, a researcher
29 could advocate that a Poisson-CLogLog model is more appropriate and use the CLogLog CDF instead of the symmetric Logit. As it is the case for the Poisson-Probit, µi = ex
0 iβ(1 −
exp(−ex0iγ)) 6= ex0i(β+γ)(1 − exp(−e−x0
iγ)) and this model is identified.