Correlation of variables - Model development for supporting advanced risk based approaches in a

To determine the correlation between the severity classes the correlation coefficient is determined between the severity classes. This correlation coefficient is determined with Pearson’s correlation coefficient, which is defined as [19]:

ρ(X, Y) = covariance(X, Y)

σXσY

= E[(X−µX)(Y −µY)

σXσY

Whereσdenotes the standard deviation.

The estimation of the correlation with a given dataset is calculated by [19]:

ρ(x, y) = Pn i=1(xi−x¯)(yi−y¯) (n−1)sxsy = Pn i=1(xi−x¯)(yi−y¯) pPn i=1(xi−¯x)2P n i=1(yi−y¯)2

Wherex¯,y¯denotes the expected number of occurrences of resp.xandy.

The value of ρ indicates the strength and nature of the correlation, where −1 ≤ ρ ≤ 1. When ρis close to one there is a positive correlation, when it is close to minus one it indicates a negative correlation, and when it is close to zero there is no (or little) correlation.

Appendix B

End-user of model

This section describes how the model is used in practical sense: the usage of SPSS in determining distributions, and performing Poisson regression. This is described in appendixB.1. Carrying out the model requires knowledge on the practical necessities and limitations of the model, this is described in appendixB.3

B.1 Guide through SPSS

This section gives information on how to gain and interpret output from SPSS (version 21) for the models used in this study.

Selection of data in SPSS

Selection of data in SPSS for performing the Kolmogorov-Smirnov test starts by creating a table which has the form of tableB.1.

Wheremdenotes the number of the months considered andnindicates the number of types consid- Number occurrences type 1 Number occurrences type 2 . . . Number occurrences type n

# month1 # month1 . . . # month1

. ... ... ...

# monthm # monthm . . . # monthm

Table B.1: Input Kolmogorov-Smirnov

ered. Note here: the lay-out for the different types of occurrences (what-categories) and the different types of severity classes is identical.

For selecting data to perform Poisson regression, the data set should be of a form similar to table

B.2. Note: in SPSS the options of the factors have to be labeled so SPSS sees them as categories and not as integers.

Factor 1 ... Factor i ... Factor n Number occurrences Number of corresponding movements

j1∈ {f1} . . . ji∈ {fi} . . . jn∈ {fn} moccurrence ∈N mmovement

. ... ... ... ... ... ...

k1∈ {f1} . . . ki∈ {fi} . . . kn ∈ {fn} moccurrence ∈N mmovement

Table B.2: Input regression for type of occurrence or severity class

Wherendenotes the number of factors taken into account,moccurrence the number of occurrences

corresponding to the combination of factors,mmovements the number of movements corresponding to

the combination of factors. All combinations of the options within the factors must be taken into account, as done in section8.2.

Selection of statistical tests in SPSS

To perform the test for Kolmogorov-Smirnov the following selections are made in SPSS: 1. Go to ’Analyze’, ’Nonparametric Tests’, ’Legacy Dialogs’, and ’1-Sample K-S...’ 2. Select all categories of which the distribution is to be tested in ’Test Variable List’ 3. Select the Poisson distribution in the box ’Test Distribution’

4. Press ’OK’

Selection of Poisson regression in SPSS

When the table similar to that in table8.3of section8.2is inserted in an SPSS datafile, take the following steps:

1. Go to ’analyze’ and select ’generalized linear models’ (twice) 2. Choose ’Poisson log-linear’ in the tab ’Type of model’

3. In tab ’Response’ select the number of occurrences as ’Dependent variable’ 4. Select the factors chosen as ’Factors’ in the tab ’Predictors’.

Go to options to select ’Descending’ in ’Category Order for factors’ .

Also select ’number of taxi-movements’ as ’Offset variable’ in the tab ’Predictors’ 5. In tab ’Model’ select all factors to the box called ’Model’

6. Press ’OK’ at the bottom of the screen.

Interaction terms are added to the model in tab ’Model’. This is done as follows: 1. Select the first factor of the interaction term

2. Press the pointing downwards (underneath the box ’Factors and Covariates). 3. Press the button ’By *’ underneath the box ’Term’

4. Select the second factor of the interaction term

5. Press the pointing downwards (underneath the box ’Factors and Covariates). 6. The box ’Term’ now says ’Factor1*Factor2’

7. Press ’Add to model’

The interaction term for three of more factors can be added similarly by repeating step 1 to 3. Output tables in SPSS for Poisson regression

SPSS gives the following tables as output after performing Poisson regression:

- Model informationtells what is chosen as dependent variable, probability distribution, Link func- tion, and Offset variable.

- Case processing summarygives the number of cases which are in-/ excluded, make sure that all cases are included; this should be the case automatically.

- Categorical variable informationgives the number of cases for each (choice within the) factor. - Continuous variable informationgives information about the dependent and the offset variable;

it gives the number of cases, the minimal and maximal value, the mean and the standard deviation. - Goodness of fitgives several values, where ’Value/ df’ is the only one that has to be considered for now. This value indicates how well the Poisson distribution fits. The Poisson distribution is applicable when this value is close to one, when it is bigger than one it implies that an over−

dispersedPoisson distribution might be a better fit; there is no formal test to decide on preference for the regular Poisson or the over-dispersed Poisson distribution.

The Poisson distribution has a mean which is equal to the variance. An over-dispersed distribution means a distribution has a variance larger than the mean.

- Omnibus testcompares the fitted model against the null-model, when this value is belowαit is said that the ’model outperforms the null-model’.

- Tests of model effectsindicates whether the influence of factors is observable, which is the case when the values of ’Sig.’ are smaller then the significance levelα.

- Parameter estimatesgives information on each estimated parameter, including: the values ofβi,

the corresponding standard errors, the 95%Wald confidence Interval, and the outcomes of the Hypothesis test.

In document Model development for supporting advanced risk based approaches in air traffic management (Page 71-75)