Computing marginal effects - Baum - An Introduction to Modern Econometrics Using Stata

One of Stata’s most powerful statistical features is the mfx command, which computes marginal effects or elasticities after estimation in point and interval form:

mfx[if ] [in] [, options]

mfxcalculates the marginal effect that a change in a regressor has on those quantities computed with predict after estimation. It automatically uses the default prediction option, for instance, for regress, the xb option that com-putes b^y.

If you use mfx (with the default dydx option) after a regression equation, the results merely reproduce the regress coefficient table with one change: the mean of each regressor is displayed. For regression, the coefficient estimates calculate the marginal effects, and they do not vary across the sample space. Of greater interest to economists are the elasticity and semielasticity measures, which we obtain with mfx options eyex, dyex, and eydx. The first is the elasticity of y with respect to xj, equivalent to ∂ log(y)/∂ log(xj). By default, these are evaluated at the multivariate point of means of the data, but they can be evaluated at any point using the at() option. The second, dyex, would be appropriate if the response variable was already in logarithmic form, but the regressor was not; this is the semielasticity ∂y/∂ log(xj) of a log-linear model. The third form, eydx, would he appropriate if the regressor was in logarithmic form, but the response variable was not; this is the semielasticity

∂ log(y)/∂xj.

The following example shows some of these options, using a form of the median price regression in levels rather than logarithms to illustrate. We compute elasticities (by default, at the point of means) with the eyex option for each explanatory variable.

. use http://www.stata-press.com/data/imeus/hprice2a, clear (Housing price data for Boston-area communities)

. regress price nox dist rooms stratio proptax

Source | SS df MS Number of obs = 506

---+--- F( 5, 500) = 165.85 Model | 2.6717e+10 5 5.3434e+09 Prob > F = 0.0000 Residual | 1.6109e+10 500 32217368.7 R-squared = 0.6239

---+--- Adj R-squared = 0.6201

Total | 4.2826e+10 505 84803032 Root MSE = 5676

---price | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---+---nox | -2570.162 407.371 -6.31 0.000 -3370.532 -1769.793 dist | -955.7175 190.7124 -5.01 0.000 -1330.414 -581.021 rooms | 6828.264 399.7034 17.08 0.000 6042.959 7613.569 stratio | -1127.534 140.7653 -8.01 0.000 -1404.099 -850.9699 proptax | -52.24272 22.53714 -2.32 0.021 -96.52188 -7.963555 _cons | 20440.08 5290.616 3.86 0.000 10045.5 30834.66

---+---nox | -.6336244 .10068 -6.29 0.000 -.830954 -.436295 5.54978 dist | -.1611472 .03221 -5.00 0.000 -.224273 -.098022 3.79575 rooms | 1.906099 .1136 16.78 0.000 1.68344 2.12876 6.28405 stratio | -.9245706 .11589 -7.98 0.000 -1.15171 -.697429 18.4593 proptax | -.0947401 .04088 -2.32 0.020 -.174871 -.014609 40.8237

---The significance levels of the elasticities are identical to those of the original coefficients.⁴⁶ The regressor rooms is elastic, with an increase in rooms having almost twice as large an effect on price in proportional terms. The other three regressors are inelastic, with estimated elasticities within the unit interval, but the 95% confidence interval for stratio includes values less than −1.0.

The at() option of mfx can compute point and interval estimates of the marginal effects or elasticities at any point in the sample space. We can specify that one variable take on a specific value while all others are held at their (estimation sample) means or medians to trace out the effects of that regressor. For instance, we may calculate a house price elasticity over the range of values of lnox in the sample. The command also handles the discrete changes appropriate for indicator variables.

The example below evaluates the variation in the elasticity of median hous-ing price with respect to the community’s student-teacher ratio in both point and interval form. We first run the regression and compute selected percentiles of stratio by using the detail option of summarize, saving them in variable

46mfxuses a largesample normal, whereas regress uses a Student t, thus causing the small difference in the output.

x val.

. // run regression

. quietly regress price nox dist rooms stratio

. // compute appropriate t-statistic for 95% confidence interval . scalar tmfx = invttail(e(df_r),0.975)

. generate y_val = . // generate variables needed (506 missing values generated)

. // summarize, detail computes percentiles of stratio . quietly summarize stratio if e(sample), detail . local pct 1 10 25 50 75 90 99

. local i = 0

. foreach p of local pct { 2. local pc‘p’=r(p‘p’) 3. local ++i

4. // set those percentiles into x_val . quietly replace x_val =‘pc‘p’’ in ‘i’

5. }

To produce the graph, we must compute elasticities at the selected per-centiles and store the mfx results in variable y val. The mfx command, like all estimation commands, leaves behind results described in ereturn list.

The saved quantities include scalars such as e(Xmfx y), the predicted value of y generated from the regressors, and matrices containing the marginal effects or elasticities. The example above uses eyex to compute the elasticities, which are returned in the matrix e(xMfx eyex) with standard errors returned in the matrix e(xMfx se eyex). The do-file extracts the appropriate values from those matrices and uses them to create variables containing the percentiles of stratio, the corresponding predicted values of price, the elasticity estimates, and their confidence interval bounds.

. local i = 0

. foreach p of local pct {

2. // compute elasticities at those points

. quietly mfx compute, eyex at(mean stratio=‘pc‘p’’) 3. local ++i

4. // save predictions at these points in y_val . quietly replace y_val = e(Xmfx_y) in ‘i’

5. // retrieve elasticities . matrix Meyex = e(Xmfx_eyex)

6. matrix eta = Meyex[1, "stratio"] // for the stratio column 7. quietly replace eyex_val = eta[1,1] in ‘i’ // and save in eyex_val 8. // retrieve standard errors of the elasticities

. matrix Seyex = e(Xmfx_se_eyex)

9. matrix se = Seyex[1, "stratio"] // for the stratio column 10. // compute upper and lower bounds of confidence interval

. quietly replace seyex1_val = eyex_val + tmfx*se[1,1] in ‘i’

11. quietly replace seyex2_val = eyex_val - tmfx*se[1,1] in ‘i’

12. }

I graph these series in figure 4.4, combining three twoway graph types:

scatter for the elasticities, rline for their standard errors, and connected for the predicted values, with a second axis labeled with their magnitudes.⁴⁷

. label variable x_val "Student/teacher ratio (percentiles ‘pct’)"

. label variable y_val "Predicted median house price"

. label variable eyex_val "Elasticity"

. label variable seyex1_val "95% c.i."

. label variable seyex2_val "95% c.i."

. // graph the scatter of elasticities vs. percentiles of stratio . // as well as the predictions with rline

. // and the 95% confidence bands with connected

. twoway (scatter eyex_val x_val, ms(Oh) yscale(range(-0.5 -2.0)))

> (rline seyex1_val seyex2_val x_val)

> (connected y_val x_val, yaxis(2) yscale(axis(2) range(20000 35000))),

> ytitle(Elasticity of price vs. student/teacher ratio)

. drop y_val x_val eyex_val seyex1_val seyex2_val // discard graph’s variables

The model’s predictions for various levels of the student-teacher ratio demon-strate that more crowded schools are associated with lower housing prices, ceteris paribus. The elasticities vary considerably over the range of stratio values.

These do-files demonstrate how much you can automate generating a table of point and interval elasticity estimates, in this case to present them graphi-cally, by using values stored in the r() and e() structures. You could adapt the do-files to generate similar estimates for a different regressor or from a different regression equation. We choose the x-axis points from the percentiles of the regressor and specify the list of percentiles as a local macro. Although many users will use mfx just for its results, you can also use those results to produce a table or graph showing the variation in marginal effects or elasticities over a range of regressor values.

47For more about Stata’s graphics capabilities, including overlaying several plot types, see A Visual Guide to Stata Graphics (Mitchell 2004).

200002500030000 Predicted median house price

−2−1.5−1−.5Elasticity of price vs. student/teacher ratio

12 14 16 18 20 22

Student/teacher ratio (percentiles 1 10 25 50 75 90 99) ...

Elasticity 95% c.i.

Predicted median house price

Figure 4.4: Point and interval elasticities computed with mfx

Exercises

1. Regress y = (2, 1, 0) on X = (0, 1, 2) without a constant term, and cal-culate the residuals. Refit the model with a constant term, and calcal-culate the residuals. Compare the residual sum of squares from this model with those from the model with a constant term included. What do you con-clude about the model fitted without a constant term?

2. Fit the regression of section 4.5.2, and use test to evaluate the hypothesis H0: 2β_ldist = βrooms. Compute the linear combination 2b_ldist − b^rooms by using lincom. Why do these two commands yield the same p-values?

What is the relationship between the F statistic reported by test and the t statistic reported by lincom?

3. Fit the regression of section 4.5.2. Refit the model subject to the linear restriction that 2β_ldist = −β^rooms. Do the results change appreciably?

Why or why not?

4. Using the regression equation estimated in the example of section 4.7, compute the elasticities of price with respect to dist at each decile of the price distribution (hint: see xtile) and produce a table containing

the 10 deciles of price and the corresponding elasticities.

4.A Appendix: Regression as a least-squares estimator

We can express the linear regression problem in matrix notation with y as an N vector, X a N × k matrix, and u an N vector as

y = Xβ + u (4.9)

Using the least-squares approach to estimation, we want to solve the sample analogue to this problem as

y = Xβb+^ub ^(4.10)

where _βb is the k-element vector of estimates of β and bu is the N-vector of least-squares residuals. We want to choose the elements of βb to achieve the minimum error sum of squares, b^u^′bu. We can write the least-squares problem as

β= arg min

β b^u^′bu = arg min

β (y − Xβ)^′(y − Xβ)

Assuming N > k and linear independence of the columns of X (i.e., X must have full column rank), this problem has the unique solution

βb= (X^′X)⁻¹X^′y (4.11)

The values calculated by least squares in (4.11) are identical to those com-puted by the method of moments in (4.4) since the first-order conditions used to derive the least-squares solution above define the moment conditions used by the method of moments.

4.B Appendix: The large-sample VCE for linear regres-sion

The sampling distribution of an estimator describes the estimates produced by applying that estimator to repeated samples from the underlying popula-tion. If the size of each sample N is large enough, the sampling distribution of the estimator may be approximately normal, whether or not the underlying stochastic disturbances are normally distributed. An estimator satisfying this property is said to be asymptotically normal. If we are consistently estimat-ing one parameter, its samplestimat-ing variance will shrink to zero as N → ∞. An

estimated parameter may be biased in small samples, but that bias will disap-pear with large N if the estimator is consistent. In the multivariate context, the variability of the estimates is described by the variance-covariance matrix of the large-sample normal distribution. We call this matrix the variance-covariance matrix of our estimator, or VCE. To evaluate the variability of our estimates, we need a consistent estimator of the VCE.

If the regressors are “well behaved” with finite second moments, we can write the probability limit, or plim, of their moments matrix, scaled by sample size N, as

plimX^′X

N = Q (4.12)

where Q is a positive-definite matrix.⁴⁸ We can then derive the distribution of the random estimates _βb_as

√N (βb− β)−→ N(0, σ^d u²Q⁻¹) (4.13)

where −→ denotes convergence in distribution as the sample size N → ∞.^d Forβb itself, we can write

βb∼ N^a

β,σ²_u

NQ⁻¹

(4.14) where ∼ denotes the large-sample distribution. To estimate the large-sample^a VCE of β, we must estimate the two quantities in (4.14): σb _u² and (1/N)Q⁻¹. We can consistently estimate the first quantity, σ²_u, as shown in (4.5) by e^′e/(N −k), where e is the regression residual vector. We can estimate the sec-ond quantity consistently from the sample by (X^′X)⁻¹. Thus we can estimate the large-sample VCE of βbfrom the sample as

VCE(β) = sb ²(X^′X)⁻¹= b^u^′b^u

N − k(X^′X)⁻¹ (4.15)

48A sequence of random variables bθN converges in probability to the constant a if for ǫ > 0, Pr(|bθN−a| > ǫ) → 0 as N → ∞. a is the plim of bθN. If bθN is an estimator of the population parameter θ and a = θ, bθN is a consistent estimator of θ.

Chapter 5

Specifying the functional form

5.1 Introduction

A key assumption maintained in the previous chapter is that the functional form was correctly specified. Here we discuss some methods for checking the validity of this assumption. If the zero-conditional-mean assumption

E[u|x¹, x2, . . . , xk] = 0 (5.1) is violated, the coefficient estimates are inconsistent.

The three main problems that cause the zero-conditional-mean assumption to fail in a regression model are

• improper specification of the model;

• endogeneity of one or more regressors; or

• measurement error of one or more regressors.

The specification of a regression model may be flawed in its list of included regressors or in the functional form specified for the estimated relationship.

Endogeneity means that one or more regressors may be correlated with the error term, a condition that often arises when those regressors are simultane-ously determined with the response variable. Measurement error of a regres-sor implies that the underlying behavioral relationship includes one or more variables that the econometrician cannot accurately measure. This chapter discusses specification issues, whereas chapter 8 addresses endogeneity and measurement errors.

128

In document Baum - An Introduction to Modern Econometrics Using Stata (Page 135-143)