Multiple linear regression model - Cleared for Take off! : an exploration on the relationship b

The relationship between the collected airport characteristics and incursion rates is modelled by means of multiple linear regression, which is an extension of simple regression, known in the form 𝑌_𝑖= 𝑏₀+ 𝐵₁𝑋_𝑖+ 𝜀_𝑖. Because in the proposed model, multiple characteristics will be added, the effect of multiple independent variables on the predictor variables need to be estimated. The general multiple linear regression equation applies (7-1).

𝑌𝚤

̂ = 𝑏0+ 𝐵1𝑋1𝑖+ 𝐵2𝑋2𝑖+ ⋯ + 𝐵𝑘𝑋𝑘𝑖+ 𝜀𝑖 (7-1)

Here, 𝑌̂_𝚤 represents the predicted variable of 𝑖th observation with 𝑖 = 1, 2, … , 𝑛, 𝑏₀ is the estimated 𝑌_𝑖 intercept, 𝐵_𝑘 describes the slope coefficients for independent variable 𝑘. Furthermore, 𝑋_𝑖 is the predictor variable and 𝜀_𝑖 represents the error term. The effect of binary data in the determination model can be added by the introduction of dummy variables

𝑑𝑘𝑖. Hence, variables could also have a non-linear relationship with the incursion rate. For this, in order to improve the

model fit, polynomial linear regression could be used. This is modelled as (7-2):

𝑌𝚤

̂ = 𝑏0+ 𝐵1𝑋1𝑖+ 𝐵2𝑋2𝑖2 + ⋯ + 𝐵𝑘𝑋𝑘𝑖𝑘 + 𝜀𝑖 (7-2)

7.1.1 Goodness of fit

To measure good the model performs in relation to the set of observations, the goodness of fit is determined, in which the squares of the sum of residuals is aimed to be minimal. For this, the considered model is tested with the likelihood ratio index, where it is measured how the model fits compared to the baseline model, where all predictor variables are equal to zero (𝑌̂ = 𝛽_𝚤 ₀+ 𝑢_𝑖⟶ 𝑌̂ = 𝑌̅_𝚤 ). The ratio is determined by (7-3):

𝑅2_{= 1 − 𝑆𝑆}𝑟𝑒𝑠

𝑆𝑆𝑡𝑜𝑡 (7-3)

Here, 𝑆𝑆_𝑟𝑒𝑠 describes the residual sum of squares and 𝑆𝑆_𝑡𝑜𝑡 the total sum of squares. Also, an improved version of the likelihood ratio index exists, which incorporates the degrees of freedom. It is known as the adjusted R-squared, where

𝑘 is number of estimation parameters (7-4):

𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2_{= 1 − 𝑆𝑆}𝑟𝑒𝑠− 𝑘

𝑆𝑆𝑡𝑜𝑡 (7-4)

For the research, both measures of 𝑅2 are used to indicate the model fit. The aim is to have a result as close as possible to 1.0. However, because of the large expected variation due to characteristics which are not covered by the scope, it is expected to have a much lower value for this.

7.1.2 Airport sample

From the high-level analysis, it was concluded to create a smaller airport sample were airports with more comparable operational characteristics were added, such as the minimal share of commercial traffic. Also, it became clear that the high severity incidents are scarce in the database, and should therefore be aggregated. This part discusses the final selection of airports and incidents for the model development.

7.1.2.1 High severity aggregation

The incident database consists of small shares of the highest severity categories, with 0.67% A incidents, 0.59% B incidents and 42.19% C incidents. The remaining share 56.56% is represented by D incidents. A and B incidents occurred at a total of airports of respectively 58 (of which 29 at hubs) and 59 (of which 26 at hubs). Therefore, it is not possible to execute a separate analysis on the relation between airport characteristics and either A of B incidents; many airports have not recorded any of these incidents during the study period. For this analysis, it was decided to combine

Model development Master Thesis 46 A, B and C into ‘high severity’, and to analyse this separate from the D incursions, because of its naturally different characteristic.

Figure 7-1: High severity incursion rates for all airports

The aggregation of the high severity incursions was translated into a visualisation of the airport’s incursion rates, as depicted in Figure 7-1. Here, the size of the dot represents the size of the airport, and the colour indicates the high severity incursion rate. It can be observed that the majority of the airports performed in the green range.

7.1.2.2 Interaction between incursion severity categories

Although the percentage composition of A and B incidents is low and not applicable for analysis on airport attributes, it can be analysed on the correlation with C incidents. Intuitively, it may be expected that C incidents are strongly related to A and B incidents, as the most important component that explains the differences is the Closest Horizontal Proximity (CHP). This is explained in more detail in Appendix A1. Thence, it is expected that in many cases the difference between an A, B or C incident was explained by the separation during incident, and that this is not directly a result of the geometry. Thus, at airports with large shares of C incidents, a higher likelihood for B and A incidents is expected. Likewise, A incidents are expected to occur more often at airports with higher numbers of B incidents.

To test the interactions, the correlations between the incursion severities were determined, of which the results are shown in Table 7-1. For all pairs, a positive 𝑆_𝑅 exists. C and D are strongly correlated, thus airports with large shares of D incidents, generally also observe large shares of C incidents. Considering the high severity pairs, A and B show a rather low correlation, which also applies for the relation of A-C, and B-C incidents. In the table, three correlation matrices are combined; in the first one, all airports are included regardless of whether an A or B incident was recorded, the second matrix is only based on airports where A incursions occurred and in the third matrix only airports where B incidents occurred are presented. The correlation between A and B, determined in this way, is 0.39 (𝑁 = 17) and not siginifcant. To conclude, relation between A, B and C incidents may be assumed, although it is not strongly justified by the data.

Table 7-1: Correlations between incursion severity classifications

Severity (𝑺_𝑹)

Spearman’s rho correlation 𝒓_𝒔

N = 420 N = 58 N = 59 A B C D A C D B C D A 1.00 1.00 B 0.19* _1.00 _1.00 C 0.32* _0.37* _1.00 _0.59* _1.00 _0.35* _1.00 D 0.32* _0.34* _0.81* _1.00 _0.54* _0.76* _1.00 _0.30* _0.80* _1.00

*Significant associations at 95% confidence interval Large

Medium Small Non-hub/ Other Hub size Rrate

3+ 2.5-3.0 2.0-2.5 1.5-2.0 1.0-1.5 0.5-1.0 0-0.5

Model development Master Thesis 47 To apply an additional check on the presence of interaction between these severity pairs, while coping with these low frequencies, it was decided to conduct a binary logistic regression. In this way, the independent variable is the number of C incidents and the dependent variable the presence of an A or B incursion (1 = yes, 0 = no). From this, the output indicates whether higher numbers of C incidents lead to a higher probability of an A or B incident being observed. Hence, this does not indicate how many of these incursions will occur for certain numbers of C, because this cannot be determined as result of the low correlations, partly due to low frequencies per airport.

For the probability that A occur, given C occurred applies: 𝑃(𝐴) = 1/(1 + 𝑒2.339−0.031𝐶), 𝑅2= 0.073 (Cox & Snell), 𝑅2= 0.132 (Nagelkerke), Hosmer and Lemeshow: .125 (𝑋2= 11.334). Likewise, for B accounts: 𝑃(𝐵) = 1/(1 +

𝑒2.425−0.036𝐶₎_,_𝑅2₌_{0.097 (Cox & Snell),}_𝑅2₌_{0.174 (Nagelkerke), Hosmer and Lemeshow: .001 (}_𝑋2₌_{24.890). Only}

the regression model for 𝑃(𝐴) appeared to be significant from the Hosmer and Lemeshow test. This is once again the consequence of the small share of A and B observations. However, it is shown that higher numbers of C incidents result in a higher probability on A and B incidents. For example, in case 10 C incidents are observed, 𝑃(𝐴) = 0.12, and when 100 C incidents are recorded, 𝑃(𝐴) = 0.68.

Mathew et al. (2016) used mixed logit models using 200 Halton draws for a 90% confidence intervals to analyse the differences in correlations between severities, in order to deal with the shortage of A and B incidents (measurement period: 2002 until 2015). Although it found that the proposed models cannot provide statistical confirmation of all types of runway for all severity levels, the relationship between A, B and C incidents was indicated per airport size. However, only one aspect was found to be significant for A incidents: the reduction of occurrences since 2002. It should be noted that the researchers obtained an even higher share of A and B incidents from their measurement period, 4% compared to approximately 1.2% in this study, which is the result of the decreasing rate.

7.1.3 Data selection

Based on the assumption from the high-level analysis about the influence of general aviation traffic on the occurrence of runway incursions, a selection of airports was used for the further modelling. Here, the large, medium and small hubs are selected, since they all represent a dominant share of commercial traffic. In this sample, the airport with lowest number of commercial movements was found (33,902). This value was then used as threshold for the selection of additional airports from the remaining population of other and non-hubs. Airports representing a number of commercial flights of at least this threshold value within the same time window, were added to the sample. This resulted in the selection of 268 from the total population of 420 airports. An overview of these airports can be found in Appendix A5. It was decided to not consider the percentage of commercial traffic, since airports were found with high shares of commercial traffic, though representing rather low traffic numbers. These airports are logically, not representative. Another aspect to note is that purely considering the airport size designation does not give a clear indication on the airport traffic, since the definition is based on the percentage of national passenger boardings. This means, that freight hubs, with large numbers of commercial freight traffic, could have be designated as a small hub or other, while their commercial traffic figures are comparable to that of medium or large hubs.

In document Cleared for Take off! : an exploration on the relationship between airport characteristics and the occurrence of runway incursions (Page 67-69)