Factor Analysis - MODELING AND ANALYSIS - Examining the Factors Causing a Drastic Reduction and

CHAPTER 5 MODELING AND ANALYSIS

5.3 Factor Analysis

The factor analysis is a technique that helps to make inferences easier on statistical results (Cattell, 1966). The identification of correlations between variables helps reduce the number of parameters to include in the model so that it does not confound the interpretation of the inferences. An existing correlation between two or multiple variables is one of the major issues in dealing with panel datasets and needs to be addressed properly. For example, the researchers of the NCHRP 17-67 found that the population and VMT are closely correlated to each other with a correlation coefficient of

r=0.98 (Blower et al., 2019). Here, between these two highly correlated variables, either

VMT or population can be included in the model to make the associated effect with the exposure interpretable. It is to be noted here that r=1.0 or -1.0 indicates a maximum positive or negative linear correlation, whereas r close to zero indicates no linear correlation existing between the variables (Montgomery and Runger, 2003).

Based on the findings of a series of factor analyses by the researchers of the NCHRP 17-67, the predictors that were chosen for the analysis are listed in Table 5.1 (Blower et al. 2019). In their analysis, the researchers chose seven categories of

variables and the variable(s) that best represented these categories were chosen, whereas the rest of the correlated parameters were excluded from the analysis.

According to Table 5.1, the total VMT was chosen as the representative variable of the overall size of each state. The proportions of rural VMT were chosen to represent the risk associated with the 'ruralness'. The economic conditions were chosen to be represented by the GDP per capita, median household income, and unemployment of 16 to 24-year age group. The rate of belt use, laws associated with belt use and motorcycle helmet wear, and the penetration of the post-1991 model year were chosen for

representing the effects of the safety measures dedicated to occupant-protection. The capital and safety expenditures were chosen under the state expenditure variable group, and beer consumption and DUI laws were chosen under ‘alcoholism’. Finally, the penetration of the post-1991 model year was chosen under the ‘other’ variable group.

Table 5.1 Variables selected through factor analysis*

Variable group Representative variable(s) Correlated predictors**

Size of state Total VMT Population, GDP, and every other

predictor

Ruralness Rural VMT proportion Total VMT, capital & safety

expenditures, median household income, beer consumption per capita

Economy GDP/capita, Median

household income, Unemployment 16 to 24

Rural VMT proportion, capital & safety expenditures

Occupant Protection

Belt use, belt laws, motorcycle helmet laws, post-1991 model year penetration

Unemployment 16 to 24, pump price

State Highway

Expenditures Capital expenditures, safety expenditures Rural VMT proportion, GDP/capita, median household income

Alcohol Beer consumption per capita,

DUI laws

Rural VMT proportion

Other Pump price Post-1991 model year

*Blower, D., C. Flannagan, S. Geedipally, D. Lord, and R. Wunderlich. Identification of factors contributing to the decline of traffic fatalities in the United States from 2008 to 2012. Final Report NCHRP Project 17-67. Transportation Research Board, Washington, D.C.,

2019.Reprinted with permission from the National Academy of Sciences, Courtesy of the National Academies Press, Washington, D.C.

The next step was to find the correlation between some of the variables, especially those related to the economy with the exposure term, VMT. This step was necessary to identify the predictors for the regression models of VMT to examine the effects of variables that were more related to the exposure than to risk in the fatality prediction models. The results of the correlation analysis between total, urban, and rural VMT and other economic variables are presented in Table 5.2

Table 5.2 Results of correlation analysis between VMT and other variables. Total population Unemployment of age 16-24 Unemployment of age 25+ Pump price GDP per cap Median income Total VMT 0.15 0.005 0.92 0.008 -0.014 -0.010 Urban VMT 0.18 0.004 0.654 0.018 -0.007 0.004 Rural VMT 0.092 0.0092 0.624 -0.012 -0.027 -0.036

According to the table, total VMT is most strongly correlated with the percentage of unemployment of age 25 and over, which is not consistent with the findings of the NCHRP 17-67 report (Blower et al., 2019). In fact, for both the urban and rural VMTs, the unemployment rate of the 25-year age-group was found to be the most strongly correlated variable, whereas the other age-group showed an almost nonexistent

correlation with all VMTs. In the NCHRP report, the variable ‘population’ showed the strongest correlation with the total, urban, and rural VMTs. Here, with the updated dataset the correlation coefficients between the total, urban, and rural VMTs and the population (r=0.15, 0.18, 0.092) are much smaller than the ones found previously

with the NCHRP 17-67 report. The rural VMT showed an insignificant negative

correlation with the pump price, GDP, and median income. All other positive or negative correlations found in the results were insignificant and thus are not discussed in detail.

Based on the findings of the factor analysis, separate linear regression models were developed for the total, rural, and urban VMTs, and the results are presented below. Table 5.3 shows the parameter estimates of the total VMT model. To develop these models, a sequential regression approach was adopted, in which statistically non- significant variables at 10% level (α=0.1) were dropped from the model one by one to find the most significant predictors (backward regression). From the table, it is clear that the total population and the unemployment rate of 25-year age-group are the only

variables sufficient to predict the total VMT that were also significant at the 15% level. The model fitness showed that the current model can explain 97% of the variability in the total VMT with the two selected predictors.

Table 5.3 Parameter estimates for the linear regression model on Total VMT

Variable Estimate Standard error P-value

Intercept 4163.67 1099.56 0.0002

Total Population 0.0088 0.00006 <0.0001

Unemployment rate of age 25

and above 313.64 216.34 0.15

R2_statistic _0.97

Table 5.4 and Table 5.5 show the parameter estimates of the regression models of the urban and rural VMTs, respectively. The same predictors as the total VMT model were also found significant for the urban VMT model (considering α= 0.05), however,

for the rural VMT model, four variables, the total population, pump price, GDP per capita, and the median income, were found to be statistically highly significant at the 5% level. The negative coefficients for the pump price, GDP per capita, and median income do not mean that the states with economic constraints were traveling more on rural roads. Rather, it indicates that the states that were largely affected by the recession were likely to be more rural. Both the urban and rural VMT models had the same R-square value (0.64), which indicates a moderate fit for the dataset.

Table 5.4 Parameter estimates for the linear regression model on Urban VMT

Variable Estimate Standard error P-value

Intercept 7242.56 924.87 <0.0001

Total Population 0.0018 4.87e-5 <0.0001

Unemployment rate of age 25

and above 425.07 181.97 0.0197

R2_statistic _0.64

Table 5.5 Parameter estimates for the linear regression model on Rural VMT

Figure 5.1, Figure 5.2, and Figure 5.3 present the graphical illustrations of the predicted versus the observed total, urban, and rural VMTs, respectively. From these

Variable Estimate Standard error P-value

Intercept 50429.55 2165.56 <0.0001 Total Population 0.00186 0.00004 <0.0001 Pump price -3930.96 431.08 <0.0001 GDP per capita -1180.83 335.48 0.0005 Median income -4281.21 438.52 <0.0001 R2_statistic _0.64

figures, it is seen that the regression models developed for the total and urban VMTs could almost perfectly predict the observed VMTs in the dataset (Figure 5.1 and Figure 5.2), whereas the rural VMT shows only reasonable goodness of fit (Figure 5.3).

Figure 5.1 Predicted total VMT versus reported total VMT, 2001-2016

Figure 5.3 Predicted rural VMT versus reported rural VMT, 2001-2016

In document Examining the Factors Causing a Drastic Reduction and Subsequent Increase of Roadway Fatalities on United States Highways Between 2005 and 2016 (Page 104-110)