• No results found

Other Models

In document Regression Analysis of Count Data (Page 110-115)

Basic Count Regression

3.7 Other Models

In this section we consider whether least-squares methods might be usefully applied to count data y. Three variations of least squares are considered. The first is linear regression of y on x, making no allowance for the count nature

of the data aside from using heteroskedasticity robust standard errors. The sec- ond is linear regression of a nonlinear transformation of y on x, for which the transformation leads to a dependent variable that is close to homoskedas- tic and symmetric. Third, we consider nonlinear least squares regression with conditional mean of y specified to be exp(xβ).

The section finishes with a discussion of estimation using duration data, rather than count data, if the data are generated by a Poisson process.

3.7.1 OLSwithout Transformation

TheOLSestimator is clearly inappropriate as it specifies a conditional mean function xγ that may take negative values and a variance function that is homoskedastic. If the conditional mean function is in fact exp(xβ), theOLS

estimator is inconsistent forβ and the computedOLSoutput gives the wrong asymptotic variance matrix.

Nonetheless,OLSestimates in practice give results qualitatively similar to those for Poisson and other estimators using the exponential mean. The ratio ofOLSslope coefficients is often similar to the ratio of Poisson slope coeffi- cients, with theOLSslope coefficients approximately ¯y times the Poisson slope coefficients, and the most highly statistically significant regressors fromOLS

regression, using usualOLSoutput t statistics, are in practice the most highly significant using Poisson regression. This is similar to comparing different mod- els for binary data such as logit, probit, andOLS. In all cases the conditional mean is restricted to be of form g(xβ), which is a monotonic transformation of a linear combination of the regressors. The only difference across models is the choice of function g, which leads to a different scaling of the parametersβ. A first-order Taylor series expansion of the exponential mean exp(xβ) around the sample mean ¯y, that is, around xβ = ln ¯y, yields exp(xβ) = ¯y + ¯y(xβ − ln ¯y). For models with intercept, this can be rewritten as exp(β1 +

x2β2)= γ1+ x2γ2, whereγ1= ¯y + β1¯y− ln ¯y and γ2= β2¯y. So linear mean

slope coefficients are approximately ¯y times exponential slope coefficients. This approximation will be more reasonable the less dispersed the predicted values exp(xiβ) are about ¯y.ˆ

TheOLSestimator can be quite useful for preliminary data analysis, such as determining key variables, in simple count models. Dealing with more compli- cated count models for which no off-the-shelf software is readily available is easier if one first ignores the count aspect of the data and does the corresponding adjustment toOLS. For example, if the complication is endogeneity, then do linear two-stage least squares as a potential guide to the impact of endogeneity. But experience is sufficiently limited that one cannot advocate this approach.

3.7.2 OLSwith Transformation

For skewed continuous data such as that on individual income or on housing prices a standard transformation is the log transformation. For example, if y is

log-normal-distributed then ln y is by definition exactly normally distributed, so the log transformation induces constant variance and eliminates skewness.

The log transformation may also be used for count data that are often skewed. Because ln 0 is not defined, a standard solution is to add a constant term, such as 0.5, and to model ln (y+ .5) byOLS. This model has been criticized by King (1989b) as performing poorly.

An alternative transformation is the square-root transformation. Following McCullagh and Nelder (1989, p. 236), let y= µ(1 + ε). Then a fourth-order Taylor series expansion aroundε = 0 yields

y1/2 µ1/2  1+1 2ε − 1 8ε 2+ 1 16ε 3 5 128ε 4 .

For the Poisson,ε = (y−µ)/µ has first four moments 0, 1/µ, 1/µ2, and (32+

13). It follows thatE[y] √µ(1−1/8µ+ O(1/µ2)),V[y] (1/4)(1+

3/8µ + O(1/µ2)), andE[(√y−E[√y])3] −(1/16√µ)(1 + O(1/µ)). Thus if y is Poisson theny is close to homoskedastic and is close to symmetric. The skewness index is the third central moment divided by variance raised to the power 1.5. Here it is less than −(1/16√µ)/(1/4)1.5= −1/2√µ. By comparison for the Poisson y is heteroskedastic with varianceµ and asymmetric with skewness index 1/√µ. The square-root transformation works quite well for largeµ.

One therefore models√y byOLS, regressing√yi on xi. The usualOLSt

statistics can be used for statistical inference. More problematic is the inter- pretation of coefficients. These give the impact of a one-unit change in xj on

E[√y] rather thanE[y], and by Jensen’s inequalityE[y] = (E[√y])2. A similar

problem arises in prediction, although the method of Duan (1983) can be used to predictE[yi], given the estimated model for√yi.

3.7.3 Nonlinear Least Squares

The nonlinear least squares (NLS) estimator with exponential mean minimizes the sum of squared residualsi(yi− exp(xiβ))2. The estimator ˆβNLS is the

solution to the first-order conditions

n i= 1 xi  yi− exp  xiβ  expxiβ  = 0. (3.63)

This estimator is consistent if the conditional mean of yi is exp(xiβ). It is

inefficient, however, as the errors are certainly not homoskedastic, and the usual reportedNLS standard errors are inconsistent. ˆβNLS is asymptotically normal with variance V[ ˆβNLS]=  n i= 1 µ2 ixixi −1 n i= 1 ωiµ2ixixi   n i= 1 µ2 ixixi −1 , (3.64)

whereωi=V[yi| xi]. The robust sandwich estimate ofV[ ˆβNLS] is (3.64), with

µi andωireplaced by ˆµi and (yi− ˆµi)2.

TheNLSestimator can therefore be used, but more efficient estimates can be obtained using the estimators given in sections 3.2 and 3.3.

Example: Doctor Visits (Continued)

Coefficient estimates of binary Poisson, ordered probit,OLS,OLSof transfor- mations of y (both ln[y+ 0.1] and √y), PoissonPMLE, andNLSwith expo- nential mean are presented in Table 3.6. The associated t statistics reported are based onRSstandard errors, except for binary Poisson and ordered pro- bit. The skewness and kurtosis measures given are for model residuals zi− ˆzi

where zi is the dependent variable, for example, zi= √yi, and are estimates

of, respectively, the third central moment divided by s3and the fourth central

moment divided by s4, where s2 is the estimated variance. For the standard

normal distribution the kurtosis measure is 3.

We begin with estimation of a binary choice model for the recoded variable d= 0 if y = 0 and d = 1 if y ≥ 1. To allow direct comparison with Poisson esti- mates, we estimate the nonstandard binary Poisson model introduced in section 3.6.1. Compared with Poisson estimates in thePoisscolumn, theBPresults for health status measures are similar, although for the statistically insignificant socioeconomic variables AGE, AGESQ, and INCOME there are sign changes. Similar sign changes for AGE and AGESQ occur in Table 3.4 and are discussed there. The log-likelihood forBPexceeds that for Poisson, but this comparison is meaningless due to the different dependent variable. Logit and probit, not reported, lead to similar log-likelihood and qualitatively similar estimates to those from binary Poisson, so differences between binary Poisson and Poisson can be attributed to aggregating all positive counts into one value.

The ordered probit model normalizes the error variance to 1. To enable comparison withOLSestimates we multiply these by s= .714, the estimated standard deviation of the residual fromOLSregression. Also, as only one ob- servation took the value 9, this was combined into a category of 8 or more. The rescaled threshold parameter estimates are .67, 1.08, 1.22, 1.39, 1.49, 1.67, and 1.99, with t statistics all in excess of 18 and all at least two standard errors apart. Despite the rescaling there is still considerable difference from theOLS

estimates. It is meaningful to compare the ordered-probit log-likelihood with that of other count data models; the change of one observation from 9 to 8 or more in the ordered probit should have little effect. The log-likelihood is higher for this model than forNB2, because−3138.1 > −3198.7, although six more parameters are estimated.

The log transformation ln (y+ 0.1) was chosen on grounds of smaller skew- ness and kurtosis than ln (y+0.2) or ln (y +0.4). The skewness and kurtosis are somewhat smaller for ln y thany. Both transformations appear quite success- ful in moving towards normality, especially compared with residuals fromOLS

Table 3.6. Doctor visits: alternative estimates and t ratios

Estimators and t statistics

Discrete choice OLSof transformations Exponential mean Variable BP OrdProb y ln yy Poiss NLS ONE −.905 −.980 .028 −2.115 .070 −2.224 −2.234 (6.66) (9.29) (.38) (21.43) (1.55) (8.74) (6.14) SEX .136 .094 .034 .081 .034 .157 −.057 (3.39) (3.03) (1.47) (2.73) (2.48) (1.98) (.42) AGE −1.356 −.381 .203 −.566 −.161 1.056 3.626 (1.76) (.46) (.46) (.97) (.60) (.77) (1.82) AGESQ 1.842 .611 −.062 .877 .292 −.849 −3.676 (2.15) (.65) (.12) (1.31) (.94) (.58) (1.70) INCOME .007 −.044 −.057 −.019 −.168 −.205 −.394 (.12) (.95) (1.65) (.43) (.80) (1.59) (2.02) LEVYPLUS .136 .098 .035 .080 .337 .123 .214 (2.80) (2.45) (1.62) (2.58) (2.41) (1.29) (1.48) FREEPOOR −.265 −.245 −.103 −.182 −.081 −.440 −.232 (2.55) (2.75) (2.17) (3.17) (3.00) (1.52) (.54) FREEREPA .223 .127 .033 .139 .054 .080 −.003 (3.16) (2.37) (.77) (2.45) (2.06) (.63) (.02) ILLNESS .148 .107 .060 .110 .048 .187 .140 (9.12) (9.23) (6.04) (8.53) (8.12) (7.81) (3.63) ACTDAYS .117 .072 .103 .106 .054 .127 .121 (14.47) (18.35) (10.61) (13.57) (13.06) (16.33) (14.21) HSCORE .034 .023 .017 .029 .013 .030 .023 (3.64) (3.54) (2.37) (3.31) (3.17) (2.11) (1.03) CHCOND1 .042 .044 .004 .022 .009 .114 .079 (.94) (1.23) (.20) (.70) (.61) (1.25) (.55) CHCOND2 .141 .096 .042 .102 .043 .141 −.055 (2.11) (2.06) (.90) (1.81) (1.62) (1.15) (.31) −lnL 2246.9 3138.1 Skewness 3.6 1.2 1.4 3.1 Kurtosis 26.4 4.0 5.5 26.0

Note: BP,MLEfor binary poisson; OrdProb, MLE for rescaled ordered probit; y,OLSfor y; ln y,

OLSfor ln(y+ 0.1); √y,OLSfor√y; Poiss, PoissonPMLE;NLS,NLSwith exponential mean. The

t statistics are robust sandwich for all but BP and OrdProb. Skewness and kurtosis are for model

residuals.

even before inclusion of regressors, as inclusion of regressors reduces skewness and kurtosis by about 20% in this example. All models give similar results re- garding the statistical significance of regressors, although interpretation of the magnitude of the effect of regressors is more difficult if the dependent variable is ln(y+ 0.1) or √y.

TheNLSestimates for exponential mean lead to similar conclusions as Pois- son for the health-status variables, but quite different conclusions for socioeco- nomic variables with considerably larger coefficients and t statistics for AGE, AGESQ, and INCOME and a sign change for SEX.

3.7.4 Exponential Duration Model

For a Poisson point process the number of events in a given interval of time is Poisson distributed. The duration of a spell, the time from one occurrence to the next, is exponentially distributed. Here we consider modeling durations rather than counts.

Suppose that for each individual in a sample of n individuals we observe the duration of one complete spell, generated by a Poisson point process with rate parameterγi. Then ti has exponential density f (ti)= γiexp(−γiti) with mean

E[ti]= 1/γi. For regression analysis it is customary to specifyγi= exp(xiβ).

The exponentialMLE, ˆβE, maximizes the log-likelihood function

lnL= n i= 1 xiβ − exp  xiβ  ti. (3.65)

The first-order conditions can be expressed as

n i= 1  1− expxiβti  xi= 0, (3.66)

and application of the usual maximum likelihood theory yields

VML[ ˆβE]=  n i= 1 xixi −1 . (3.67)

If instead we modeled the number of events from a Poisson point process with rate parameterγi= exp(xiβ) we obtain

VML[ ˆβP]=  n i= 1 γixixi −1 .

The two variance matrices coincide ifγi= 1. Thus if we choose intervals

for each individual so that individuals on average experience one event such as a doctor visit, the count data have the same information content, in terms of precision of estimation ofβ, as observing for each individual one completed spell such as time between successive visits to the doctor. More simply, one count conveys the same information as the length of one complete spell.

In document Regression Analysis of Count Data (Page 110-115)