4.2 How do we estimate conditional mean functions?
5.1.2 Single Index Model
The single index model restricts the function g(x) under consideration to be g(x) = φ(x0β0)
where φ is an unknown function. An estimator of the slope coefficients β0in the single index model that allows for discrete regressors and regressors which may be functionally related is studied by Ichimura (1993).
Consider the single index model in the conditional mean function: Yi = φ Xi0β0 + εi
E (εi|Xi) = 0.
This model arises naturally in a variety of limited dependent variable models in which the observed dependent variable Yi is modeled as a transformation of Xi0β0 and an unobserved variables which
are independent of Xi. See Heckman and Robb (1985) and Stoker (1986). Also, this model can be
viewed simply as a generalization of the regression function. Observe that mW(b) ≡ E n Y − E Y |X0 b2W (X) o = Eε2W (X) + Enφ X0β0 − E Y |X0b2W (X)o
The computation makes clear that, for any function W (x), the variation in Y has two sources: the variation in X0β0and that in ε and that if we choose b to be proportional to β0, then contribution to the variation due to the variation in X0β0becomes zero in function mW(b) as E (Y |X0b) = φ (X0β0)
in that case. This observation lead to defining an estimator as min b 1 n n X i=1 yi− E yi|x0ib 2 W (xi)
if we knew the conditional mean function E (Yi|Xi0b). As we do not know it, we need to replace it
with its estimate. But since the conditional mean function cannot be estimated at points where the density of Xi0b is low, we need to introduce trimming as other estimators we examined earlier. The trimming function in this case has a further complication. Even if the density of X is bounded away from zero, the density of X0b is not, in general. This can be understood by considering two variables that has the uniform distribution on the unit square and considering the density corresponding to the sum.
A simple way around this problem is to define the trimming function as follows: Ii= 1 {xi ∈ X } ,
where X denotes a fixed interior points of the support of Xi by at least certain distance. Note that
over this set X , by construction the density of x is bounded away from zero and that the density of X0b is also bounded away from zero.
Another point to note is that for any constant value c 6= 0, E (Y |X0b = x0b) = E (Y |X0(cb) = x0(cb)) so that we cannot identify the length of β0. Thus we define the estimator to be the minimizer of the following objective function after replacing E (Yi|Xi0b) with a nonparametric estimator of it:
min b∈{b:b0b=1} 1 n n X i=1 h yi− ˆE yi|x0ib i2 W (xi) Ii.
In implementation, two forms of normalization are used; in some cases β0β = 1 is imposed and in other cases one of the coefficient is set to 1.46 In either case, the Var-Cov matrix of the estimator is V−ΩV−, where V = Enϕ0 x0β 0 2 ˜x − E ˜x|x0β0 ˜x − E ˜x|x0β00o , Ω = E n σ2(x)ϕ0 x0β02 ˜x − E ˜x|x0β0 ˜x − E ˜x|x0β00 o , σ2(x) = V (y|x)
and all of the expectations are taken over a given set X over which the density of x0β0 is assumed to be bounded away from 0. When β0β = 1, ˜x = x and when one of the coefficients is set to 1, ˜x is the original regressors except the regressor whose coefficient is set to 1. For the first normalization, note that Ωβ0= 0 and V β0 = 0 hold so that V and Ω are not invertible.
There are two sources of efficiency loss. One is that the variation in ˜x − E (˜x|x0β0) is used rather than the variation in ˜x. The other is that heteroskedasticity is not accounted for in the estimation. While the first problem arises as φ is unknown, and hence is genuine to the formulation of the problem, the second problem can be resolved by weighting if the model is truly single index. Oftentimes, however, we use the single index model as a convenient approximation to a more general function. Ichimura and Lee (2006) shows that if the single index model is used when the underlying model is not single index, the SLS estimator still is consistent to a vector which best approximates the conditional mean function within the single index model, and it is asymptotically normal but its asymptotic variance contains an additional term. They discuss how to estimate the asymptotic variance term including this additional term and hence how to make the estimator robust to misspecification. Here the discussion used the linear single index, but the same idea applies to the nonlinear index model and also to the case of multiple indices. See Ichimura and Lee (1991).
When the dependent variable is discrete, the more natural objective function is likelihood based. Klein and Spady (1993) examines the case of binary choice models and shows that the estimator is efficient among semiparametric estimators.
Blundell and Powell (2003) considers the single index model with an endogenous regressor and Ichimura and Lee (2006) considers the estimation of the conditional quantile function when the conditional quantile function is modeled as a single index function.
46
We consider W (x) = 1 for simplicity below. See Ichimura (1993) for the weighted case. In general we need to modify the standard estimation of E (Y |X0b) to achieve efficiency by weighting.