Simultaneous Equation Models
5.3 Simultaneous Equation Estimation
There are two broad classes of simultaneous equation techniques — single- equation estimation methods and systems estimation methods. The distinc- tion between the two is that systems methods consider all of the parameter restrictions (caused by over-identification) in the entire equation system and account for possible contemporaneous (cross-equation) correlation of distur- bance terms. Contemporaneous disturbance term correlation is an important consideration in estimation. For example, in the vehicle utilization equation system (Equations 5.1 and 5.2), one would expect I1andI2to be correlated because vehicles operated by the same household will share unobserved effects (common to that household) that influence vehicle utilization. Because system estimation approaches are able to utilize more information (param- eter restrictions and contemporaneous correlation), they produce vari- ance–covariance matrices that are at worst equal to, and in most cases smaller than, those produced by single-equation methods (resulting in lower stan- dard errors and higher t statistics for estimated model parameters).
Single-equation methods include instrumental variables (IV), indirect least squares (ILS), two-stage least squares (2SLS), and limited information max- imum likelihood (LIML). Systems estimation methods include three-stage least squares (3SLS) and full information maximum likelihood (FIML). These different estimation techniques are discussed in the following section.
5.3.1 Single-Equation Methods
ILS involves OLS estimation of reduced form equations. In the case of vehicle utilization in two-vehicle households, this would be OLS estimation as depicted in Equations 5.6 and 5.7. The estimates of these reduced form parameters are used to determine the underlying model parameters solving such equations as 5.8 and 5.9. Because solving underlying model parameters tends to result in nonlinear equations (such as Equation 5.10), the unbiased estimates of the reduced form parameters (from Equations 5.6 and 5.7) do not produce unbiased estimates of the underlying model parameters. This is because the ratio of unbiased parameters is biased. There is also the
problem of having multiple estimates of underlying model parameters if the equation system is over-identified (as in Equation 5.10).
An IV approach is the most simplistic approach to solving the simulta- neous equations estimation problem. This approach simply replaces the endogenous variables on the right-hand side of the equations in the equation system with an instrumental variable — a variable that is highly correlated with the endogenous variable it replaces and is not correlated to the distur- bance term. For example, in Equations 5.1 and 5.2, the IV approach would be to replace u2 and u1, respectively, with appropriate instrumental variables and to estimate Equations 5.1 and 5.2 using OLS. This approach yields consistent parameter estimates. The problem, however, is one of finding suitable instruments, which is difficult to near impossible in many cases.
2SLS is an extension of instrumental variables in that it seeks the best instru- ment for endogenous variables in the equation system. Stage 1 regresses each endogenous variable on all exogenous variables. Stage 2 uses regression-esti- mated values from stage 1 as instruments and estimates each equation using OLS. The resulting parameter estimates are consistent, and studies have shown that most small-sample properties of 2SLS are superior to ILS and IV.
LIML maximizes the likelihood function of the reduced form models, generally assuming normally distributed error terms. Unlike ILS, the likeli- hood function is written to account for parameter restrictions (critical for over-identified models). This alleviates the ILS problem of having multiple estimates of underlying model parameters in over-identified equations.
In selecting among the single-equation estimation methods, 2SLS and LIML have obvious advantages when equations are over-identified. When the equations are exactly identified and the disturbances are normally dis- tributed, ILS, IV, 2SLS, and LIML produce the same results. In over-identified cases, 2SLS and LIML have the same asymptotic variance–covariance matrix, so the choice becomes one of minimizing computational costs. A summary of single-equation estimation methods is provided in Table 5.1.
5.3.2 System Equation Methods
System equation methods are typically preferred to single-equation methods because they account for restrictions in over-identified equations and con- temporaneous (cross-equation) disturbance term correlation (the correlation of disturbance terms across the equation system). 3SLS is the most popular of the system equation estimation methods. In 3SLS, stage 1 is to obtain the 2SLS estimates of the model system. In stage 2, the 2SLS estimates compute residuals from which cross-equation disturbance term correlations are cal- culated. In stage 3, generalized least squares (GLS) computes parameter estimates. Appendix 5A at the end of this chapter provides an overview of the GLS estimation procedure.
Because of the additional information considered (contemporaneous cor- relation of disturbances), 3SLS produces more efficient parameter estimates
than single-equation estimation methods. An exception is when there is no contemporaneous disturbance term correlation. In this case, 2SLS and 3SLS parameter estimates are identical.
FIML extends LIML by accounting for contemporaneous correlation of disturbances. The assumption typically made for estimation is that the disturbances are multivariate normally distributed. Accounting for con- temporaneous error correlation complicates the likelihood function con- siderably. As a result, FIML is seldom used in simultaneous equation estimation. And, because under the assumption of multivariate normally distributed disturbances, FIML and 3SLS share the same asymptotic vari- ance–covariance matrix, there is no real incentive to choose FIML over 3SLS in most applications. A summary of system equation estimation meth- ods is provided in Table 5.2.
Example 5.1
To demonstrate the application of a simultaneous equations model, con- sider the problem of studying mean vehicle speeds by lane on a multilane freeway. Because of the natural interaction of traffic in adjacent lanes, a simultaneous equations problem arises because lane mean speeds are determined, in part, by the lane mean speeds in adjacent lanes. This problem was first studied by Shankar and Mannering (1998) and their data and approach are used here.
TABLE 5.1
Summary of Single Equation Estimation Methods for Simultaneous Equations
Method Procedure
Resulting Parameter Estimates
Indirect least squares (ILS)
Applies ordinary least squares to the reduced form models
Consistent but not unbiased Instrumental
variables (IV)
Uses an instrument (a variable that is highly correlated with the endogenous variable it replaces, but is not correlated to the disturbance term) to estimate individual equations
Consistent but not unbiased
Two-stage least squares (2SLS)
Approach finds the best instrument for endogenous variables: Stage 1 regresses each endogenous variable on all exogenous variables; Stage 2 uses regression- estimated values from stage 1 as instruments, and estimates equations with OLS
Consistent but not unbiased; generally better small- sample properties than ILS or IV
Limited information maximum
likelihood (LIML)
Uses maximum likelihood to estimate reduced form models; can incorporate parameter restrictions in over-identified equations
Consistent but not unbiased; has same asymptotic variance–covariance matrix as 2SLS
For this example, data represent speeds obtained from a six-lane freeway with three lanes in each direction separated by a large median (each direction is considered separately). A summary of the available data is shown in Table 5.3. At the point where the data were gathered, highly variable seasonal weather conditions were present. As a consequence, seasonal factors are expected to play a role. The data were collected over a period of a year, and the mean speeds, by lane, were the mean of the spot speeds gathered over 1-h periods. The equation system is written as (see Shankar and Mannering, 1998)
TABLE 5.2
Summary of System Estimation Methods for Simultaneous Equations
Method Procedure
Resulting Parameter Estimates
Three-stage least squares (3SLS)
Stage 1 obtains 2SLS estimates of the model system; Stage 2 uses the 2SLS estimates to compute residuals to determine cross- equation correlations; Stage 3 uses GLS to estimate model parameters
Consistent and more efficient than single- equation estimation methods
Full information maximum likelihood (FIML)
Similar to LIML but accounts for contemporaneous correlation of disturbances in the likelihood function
Consistent and more efficient than single- equation estimation methods; has same asymptotic variance–covariance matrix as 3SLS
TABLE 5.3
Lane Mean-Speed Model Variables
Variable
No. Variable Description
1 Mean speed in the right lane in kilometers per hour (gathered over a 1-h period) 2 Mean speed in the center lane in kilometers per hour (gathered over a 1-h
period)
3 Mean speed in the left lane in kilometers per hour (gathered over a 1-h period) 4 Traffic flow in right lane (vehicles per hour)
5 Traffic flow in center lane (vehicles per hour) 6 Traffic flow in left lane (vehicles per hour)
7 Proportion of passenger cars (including pickup trucks and minivans) in the right lane
8 Proportion of passenger cars (including pickup trucks and minivans) in the center lane
9 Proportion of passenger cars (including pickup trucks and minivans) in the left lane
10 Month that speed data was collected (1 = January, 2 = February, etc.) 11 Hour in which data was collected (the beginning hour of the 1-h data collection
(5.11) (5.12) (5.13) where the s are the mean speeds (over a 1-h period in kilometers/h) for the right-most lane (subscript R) relative to the direction of travel (the slow lane), the center lane (subscript C), and the left lane (subscript L), respec- tively. The Z are vectors of exogenous variables influencing the mean speeds in the corresponding lanes, the FFFF are vectors of estimable parame- ters, the P and X are estimable scalars, and the I are disturbance terms. The equation system (Equations 5.11 through 5.13) is estimated with 2SLS and 3SLS and the estimation results are presented in Table 5.4. These results show that there are noticeable differences between 2SLS and 3SLS param- eter estimates in the equations for mean speeds in the right, center, and left lanes. These differences underscore the importance of accounting for contemporaneous correlation of disturbance terms. In this case, it is clear that the disturbances IR,IC,ILshare unobserved factors occurring over the
hour during which mean speeds are calculated. These unobserved factors, which are captured in equation disturbances, could include vehicle dis- ablements, short-term driver distractions, weather changes, and so on. One would expect contemporaneous disturbance term correlation to diminish if more complete data were available to estimate the model. Then, the difference between 2SLS and 3SLS parameter estimates would diminish. The interested reader should see Shankar and Mannering (1998) for a more detailed analysis of these mean-speed data.