Chapter 4 – RESULTS AND DISCUSSION
4.3 The Binary Logistic Regression Analysis Findings
4.3.1 Operator Model
As previously discussed and presented, crosstabulation gave us an understanding of how one single variable increases or decreases the odds of fatal injury in the event of an accident. However, it is probable that two or more variables may come into play at the same time; so, in order to investigate the combined effect of such variables, we carried out a binary logistic regression analysis.
We started modeling with the operators. The intent was to provide a model that could be used to predict the degree of injury for operators who ride one of the selected types of equipment (backhoes, excavators, bulldozers and scrapers) on construction sites. Hence, we ran a binary logistic regression analysis for a subset consisting of only “operator cases”. This subset was extracted from the main dataset by filtering the “occupation” variable. A total of 376 operator cases were identified. Again, as discussed in the methodology section, this subset was divided into two sections; 70% (271 cases) was used to develop a model, and the remaining 30% (105) was used to validate the model.
Variable selection was conducted according to crosstabulation and univariate analysis results. For modeling, we included all the variables that showed significant association in crosstabulation analysis. The variables, their levels, and their coding and type that were entered in the binary logistic regression analysis to develop the “Operator Model” is presented in Table 65.
Table 65: Variables entered into analysis for Operator Model
Variables used for analysis Levels and Coding Variable Type
1. Degree of injury (Dependent variable)
Fatal:1 Non-fatal: 0
Dichotomous
2. Union status Union:1 Nonunion: 0
Dichotomous
3. Seat Belt Presence Present:1 Not present: 0
Dichotomous
4. Cited for Safety Training Provided:1 Not provided: 0
Dichotomous
5. Equipment Safety System Present :1 Not present: 0
Dichotomous
6. Equipment Maintenance Present: 1 Not present: 0
Dichotomous
7. SIC Provided:1
Not provided: 0
Nominal
8. Equipment Type Backhoe: 1 Bulldozer: 2 Excavator: 3 Scraper: 4
Nominal
9. Environmental Factor Materials handling equipment/method: 1 Work-surface/facility layout condition: 2 Overhead moving/falling object action: 3 Squeeze point action: 4
Pinch point action: 5 Flying object action: 6
Flammable liquid/solid exposure: 7 Catch point / puncture action: 8 Blind spot: 9
Other: 10
Nominal
10. Human Factor Misjudgment of hazardous situation/; 1
Inappropriate choice/use of equipment/methods: 2 Inoperable/malfunctioned safety/warning devices: 3 Insufficient engineering and admin controls: 4 Human system malfunction: 5
Distracting actions by others: 6 Other: 7
Nominal
The base model had a naive predictive power of 69.9%, which indicates the overall percentage of correctly classified cases when there are no predictive variables in the model. Therefore, a model with added predictive variables has to improve the accuracy of this prediction. Loglikelihood value of the base model was found to be 267.629. This value was used for the best model selection.
We started with the “stepwise backward enter” method. The 10 variables mentioned in Table 65 were entered into the analysis and by extracting insignificant ones, model iteration stopped at the fourth step. The analysis was performed at p=0.05 significance level to create the model. Table 66 and Table 67 summarize the results of this analysis.
When we closely examined the process, the model at the fourth step was the best of all for predicting the degree of injury. Its prediction power or accuracy was measured as 76.2%, which was greater than the naive predictor power. (see Table 66)
As one can see in the Table 67 footnote, the developed model’s loglikelihood value (233.969) is smaller than the loglikehood of the base model. We can thus conclude that the developed model is better at predicting the degree of injury than the base model where no predictor variables were added. When we take up the question of goodness of fit for the model, the Hosmer and Lemeshow test revealed that data fits the model satisfactorily. A poor fit is indicated by a significance value of less than .05; hence, the significance value of 0.757 is greater than 0.05 supports the goodness of fit for the model.
Table 66: Operator model classification table
Observed
Predicted
Model Development Set Validation Set
Degree of injury % Correct
Degree of injury % Correct Nonfatal Fatal Nonfatal Fatal
DV Nonfatal 17 41 29.3 11 17 39.3
Fatal 17 169 90.9 11 93 89.4
Overall % 76.2 78.8
As previously mentioned the data was split in two to develop and validate the model. Table 66 shows the prediction power of the model as 76.2%. It was also found
that the same model correctly predicted 78.8% of the validation data, which means the model more accurately predicts the degree of injury than the naïve prediction. Table 67 lists the variables in the model used to predict the degree of injury for selected heavy construction equipment operators in the event of an accident.
Table 67: Operator Model results
Variable
B S.E. Wald df Sig. Exp(B)
95% C.I.for EXP(B) Lower Upper Safety Program(1) .967 .433 4.989 1 .026 2.631 1.126 6.149 Safety Training(1) -1.352 .376 12.900 1 .000 .259 .124 .541 Union Status(1) -1.024 .375 7.436 1 .006 .359 .172 .750 Equipment Protective Systems -1.187 .512 5.370 1 .020 .305 .112 .833 Constant 2.442 .564 18.743 1 .000 11.496
* -2 Loglikelihood = 233.969; Hosmer and Lemeshow Chi-square Test χ2(7)=4.192, p=0.757
In light of this information safety program (SP), safety training (ST), union status (US) and equipment protective systems presence (EPS) have a significant effect on degree of injury. By examining the β coefficients, it was revealed that all variables except for “safety program” have a decreasing effect on the probability of a fatal injury.
Table 68: Relative importance of variables in the operator model Model Log Likelihood Change in -2 Log Likelihood df Sig. of the Change Safety Program -119.440 4.911 1 .027 Safety Training -124.280 14.591 1 .000
Equipment Protective Systems -120.264 6.558 1 .010
Union Status -120.638 7.308 1 .007
When we questioned which variable is important for the model, we used the loglikelihood value change as a measure factor. As one can see in Table 68, removing the safety training variable changes the loglikelihood of the model more than the other variables in the model.