• No results found

7 Heathrow weather delay analysis

7.2 Binary Delay Model

The following section presents a summary of the logistic regression modelling of weather delay occurrences. Binomial (or binary) logistic regression is a form of regression that is used when the response variable is dichotomous, in this case Delay or No Delay. The response variable (occurrence of delay) was determined using Eurocontrol’s CODA data, for the period January 2002 – May 2006. Days in the sample period with reported weather delay at Heathrow airport were assigned value one (i.e.

Delay), whilst days with no reported delays were assigned a zero (i.e. No Delay).

Logistic regression is a form of regression that allows the prediction of discrete variables by a mix of continuous and discrete predictors. First of all, logistic regression does not assume a linear relationship between the dependent and the independent variables. It may handle nonlinear effects even when exponential and polynomial terms

are not explicitly added as additional independent variables. This is possible since the logit link function on the left-hand side of the logistic regression equation is non-linear.

In addition, the dependent variable does not need to be normally distributed, although it does assume its distribution is within the range of the exponential family of distributions, such as normal, Poisson, binomial, or gamma.Overall, logistic regression addresses the same questions that multiple regression does but with no distributional assumptions on the predictors (the predictors do not have to be normally distributed, linearly related or have equal variance in each group).

In this analysis, logistic regression is used to predict a response variable (Delay/No Delay) on the basis of several predictors and to determine the percent of variance in the response variable explained by the predictors. In addition, logistic regression is used to rank the relative importance of predictors.

Logistic regression is estimated by maximum likelihood estimation after transforming the response variable into a logit variable (the natural log of the odds of whether the weather delay occurs or not). Logistic regression thus estimates the probability of a weather delay occurrence. It calculates changes in the log-odds of the response variable and not changes in the response variable itself as linear regression does.

Coefficients in logistic regression (βi) are the values for the logistic regression equation for predicting the response variable from the predictors. Coefficients are in log-odds units. Similar to linear regression, the prediction equation is presented in Equation 7-3:

(7-3)

where p is the probability of delay occurrence and x1 to xn are different model predictors.

With the inclusion of all possible predictors (16), the Hail predictor was found to predict success perfectly2 (for 12 observations) and therefore, it was excluded from further logistic modelling. The final logistic model is presented in Table 7-5.

The overall model is statistically significant (the p-value is less than 0.001). In other words, the hypothesis that weather-related delay predictors, taken together, have no effect on the response variable is rejected. In addition, out of 16 predictors entered

2 The Hail predictor is perfectly correlated with dependent variable.

Chapter 7 Heathrow weather delay analysis

into the model 14 predictors are found to be statistically significant. The two statistically non-significant predictors are SeasonSummer and CrossWind.

Table 7-5. Logistic model of occurrence of delay Delay percent of correct predictions of days with occurrence of weather-related air traffic delays.

Table 7-6. Classification table for logistic model Predicted Delay

Observed Delay Delay No Delay Total

Classified Correctly

+0.746 × SeasonW int er + (−0.027) × SeasonSummer + (−0.424) × SeasonSpring +

+0.294 × Holiday + (−0.425) × Weekend + 0.001× ATM (7-4) For binomial logistic regression parameter estimates (β coefficients) are logits of predictors used in the logistic regression equation to estimate the log-odds that the response variable equals 1. Parameter estimates show the relationship between the predictors and the response variable, where the response is on the logit scale. They show the amount of increase in the predicted log-odds of Delay = 1 that would be predicted by a 1 unit increase in the predictor, holding all other predictors constant.

For example, the coefficient (or parameter estimate) for the predictor WindSpeed is 0.339. This means that for a one-unit increase in WindSpeed, there is an expected 0.339 increase in the log-odds of the response variable Delay, holding all other predictors constant. However, since these coefficients are in log-odds units, they are difficult to interpret, and hence they are often converted into an odds ratios or explained through their marginal effects or discrete changes.

There are two ways of estimating how much the event probability changes when a given predictor is changed by one unit. The marginal effect of a predictor is defined as the partial derivative of the predicted event probability with respect to the predictor of interest. A more direct measure is the change in predicted probability for a unit change in the predictor (Long and Freese, 2003).

For a binary logistic model, the marginal effect is equal to pi/(1–pi)β, where pi is the event probability at the i-th setting of the predictors, and β is the parameter estimate for the predictor of interest. For the example of the Thunder predictor, xThunder = 0.7185, nonlinearity of the model, it is difficult to translate the marginal effects into the change that will occur if there is a discrete change (e.g. a 1-unit change) in predictor Xk. In addition, marginal effects are inappropriate for binary predictors. Long (1997) suggests measures of discrete change. Discrete change is the difference in the predicted value as one predictor changes values while all others are held constant at specified values.

Chapter 7 Heathrow weather delay analysis

Marginal effects and discrete change of all predictors are calculated for the mean values of the continuous predictors and for the dummy variables set to zero. The results are presented in Table 7-7.

Table 7-7. Marginal effects and discrete change of predictors Predictor Marginal Effect/

Discrete Change z P>z X (mean)

WindSpeed 0.0783 8.39 0.000 8.04305

MinimumTemperature -0.0286 -6.63 0.000 7.83973

Rain 0.0196 3.47 0.001 1.51458

HeadTailWind -0.0018 -2.89 0.004 44.4024

CrossWind 0.0040 0.80 0.426 1.40331

Holiday* 0.0646 2.12 0.034 0

Weekend* -0.1026 -2.97 0.003 0

ATM 0.0005 2.35 0.019 1289.63

Note: * - is for discrete change of dummy variable from 0 to 1

Table 7-7 shows that continuous predictors that have the largest influence on the change in the probability of the occurrence of a delay are WindSpeed and Visibility. An increase of 1 knot in wind speed increases the chance of delay occurrence by 7.83 percent, controlling for other predictors in the model. In addition, a decrease of 1 mile in visibility increases the chance of a delay occurrence by approximately 6 percent.

The largest discrete changes for dummy variables are for the Thunder, Snow, and Fog predictors since occurrence of each increases the chance of a delay by 33.63, 26.43, and 25.01 percent respectively.

For continuous predictors, the percent change in the probability of occurrence of delay with a 10 and 50 percent increase from the predictor’s mean is also calculated. The results are shown in Table 7-8.

Table 7-8. Percent change in probability of Delay occurrence

Predictor Delay Probability +10% +50% 10% Change 50% Change

WindSpeed 0.6392 0.6995 0.8740 9.43 36.73

MinimumTemperature 0.6392 0.6165 0.5214 -3.56 -18.43

Rain 0.6392 0.6422 0.6540 0.46 2.31

Visibility 0.6392 0.5980 0.4249 -6.45 -33.53

GustFactor 0.6392 0.6528 0.7043 2.12 10.18

HeadTailWind 0.6392 0.6312 0.5982 -1.26 -6.42

CrossWind 0.6392 0.6398 0.6420 0.09 0.43

7.3 Delay Impact Model (Percent of Flights Delayed by