• No results found

Linear Regression

3.3 Manipulating Variables in Regression

3.3.3 Indicator Variables

Often there is interest in modeling the effects of ordinal and nominal scale variables. As examples the effects of qualitative variables such as roadway functional class, gender, attitude toward transit, and trip purpose are often sought. The interpretation of nominal and ordinal scale variables in regression mod- els is different from that for continuous variables.

For nominal scale variables, m 1 indicator variables must be created to represent all m levels of the variable in the regression model. These m 1 indicator variables represent different categories of the response, with the omitted level captured in the slope intercept term of the regression. Theo- retically meaningless and statistically insignificant indicator variables should be removed or omitted from the regression, leaving only those levels of the nominal scale variable that are important and relegating other levels to the “base” condition (slope intercept term).

Example 3.3

Consider again the study of AADT in Minnesota. In Example 3.2 three indicator variables representing three different functional classes of roadways were used. The variable FUNCLASS3 represents the subset of observations belonging to the class of facilities that are urban interstates. The example shows that urban interstates have an esti- mated marginal effect of 35,453.7 AADT; that is, urban interstates are associated with 35,453.7 more AADT on average than urban non- interstates (FUNCLASS4). The effect of this indicator variable is

F0! EXP 0

B F

1! EXP 1

statistically significant, whereas the effects for FUNCLASS1 and

FUNCLASS2 are not. In practice these findings would be contrasted

to theoretical expectations and other empirical findings. The effect of urban non-interstates (FUNCLASS4) is captured in the intercept term. Specifically, for urban non-interstate facilities the county pop- ulation (CNTYPOP) is multiplied by 0.0288, then 9953.7 is added for

each lane of the facility, and then 26,234.5 (the Y-intercept term B0)

is subtracted to obtain an estimate of AADT.

Ordinal scale variables, unlike nominal scale variables, are ranked, and can complicate matters in the regression. Several methods for dealing with ordinal scale variables in the regression model, along with a description of when these methods are most suitable, are provided in the next section.

Estimate a Single Beta Parameter

The assumption in this approach is that the marginal effect is equivalent across increasing levels of the variable. For example, consider a variable that reflects the response to the survey question: Do you support conges- tion pricing on the I-10 expressway? The responses to the question include: 1 = do not support; 2 = not likely to support; 3 = neutral; 4 = likely to support; and 5 = strongly support. This variable reflects an ordered response with ordered categories that do not possess even intervals across individuals or within individuals. If a single beta parameter for this vari- able is estimated, the assumption is that each unit increase (or decrease) of the variable has an equivalent effect on the response. This is unlikely to be a valid assumption. Thus, disaggregate treatment of ordinal variables is more appropriate.

Estimate Beta Parameter for Ranges of the Variable

Suppose that an ordinal variable had two separate effects, one across one portion of the variable’s range of values, and another over the remainder of the variable. In this case two indicator variables can be created, one for each range of the variable. Consider the variable NUMLANES used in previous chapter examples. Although the intervals between levels of this variable are equivalent, it may be believed that a fundamentally different effect on AADT exists for different levels of NUMLANES. Thus, two indicator variables could be created such that

. Ind NUMLANES NUMLANES Ind NUMLANES NUMLANES 1 2 2 0 !®¯± e e °± !®¯± " °± if 1 otherwise if 2 0 otherwise

These two indicator variables would allow the estimation of two param- eters, one for the lower range of the variable NUMLANES and one for the upper range. If there was an a priori reason to believe that these effects would be different given theoretical, behavioral, or empirical consider- ations, then the regression would provide evidence on whether these sep- arate ranges of NUMLANES supported separate regression parameters. Note that a linear dependency exists between these two indicator variables and the variable NUMLANES.

Estimate a Single Beta Parameter for m  1 of the m Levels of the Variable

The third, most complex treatment of an ordinal variable is equivalent to the treatment of a nominal scale variable. In this approach m  1 indicator variables are created for the m levels of the ordinal scale vari- able. The justification for this approach is that each level of the variable has a unique marginal effect on the response. This approach is generally applied to an ordinal scale variable with few responses and with theo- retical justification. For the variable NUMLANES, for example, three indi- cator variables could be created (assuming the range of NUMLANES is from 1 to 4) as follows:

.

This variable is now expressed in the regression as three indicator variables, one for each of the first three levels of the variable. As before, linear regres- sion is used to assess the evidence whether each level of the variable NUM- LANES deserves a separate beta parameter. This can be statistically evaluated using an F test described later in this chapter.