# Statistical techniques

## 3.7 Geospatial and statistical techniques

### 3.7.3 Statistical techniques

This section identifies and outlines the various statistical techniques used in this research.

3.7.3.1 One-way analysis of variance (ANOVA)

A one-way ANOVA is a parametric statistical test used to compare the differences between the means of more than two groups. “ANOVA is a way of comparing the ratio of systematic variance to unsystematic variance” (Fields, 2014, p. 430). This method is a well-established statistical tool and has been deployed in various empirical studies. Deploying ANOVA is

92 based on some assumptions (Warner, 2008; Rayner and Best, 2013), which are explained below:

a) The dependent variable should be measured on a continuous scale (i.e. interval or ratio), whereas the independent variable should be measured on a categorical scale (nominal or ordinal) and consists of more than 2 groups.

b) There should be independence of observation. Each measurement or participant should be a member of only one group. In relation to this research, this means that no LSOA should be in 2 or more different socio-economic classifications.

c) There should be homogeneity of variances between the groups, as with all parametric tests. To investigate this, Levene’s test of equality of variances is adopted. If the test statistic is not significant (p > .05), then it is assumed that the data meets this assumption. On the other hand, if the test violates the assumption (p

< .05), then a different ANOVA test called the Welch Test is employed because of its robustness in handling violation of this assumption (Elmore and Woehlke,1988;

Glass et al.,1972).

d) The distribution of the dependent variable should be approximately normal across the different groups. This can be examined using the skewness and kurtosis statistics.

However, if the sample sizes are large enough (> 30), the effect of violating this assumption is minimal (Pallant, 2016). In addition, there is also strong evidence that suggests that the Welch technique is robust, and violation of non-normality does not have a strong bearing on the accuracy of the probability results (Glass et al., 1972;

Hopkins and Weeks, 1990).

The main purpose of ANOVA is to examine if there are significant differences within the groups’ means as mentioned above. Therefore, a significant ANOVA (p < .05) demonstrates that the mean differences between the groups differ significantly. Then the question as to which groups significantly differ from each other arises. This is difficult to know because there are three or more groups. To examine this, a post-hoc/multiple comparison test (Turkey’s or Games Howell test (the former for non-violation and the later for violation of homogeneity of variances) is employed (Field, 2013; Pallant, 2016). Therefore, for this thesis, a one-way ANOVA is used to compare the mean differences between the distribution of FGRs and AASRs across different SED classifications, where the FGRs and AASRs are the dependent variables and SED classifications are the independent variables.

93 3.7.3.2 Binary logistic regression (BLR)

Regression analysis was used in this thesis to assess the best fitting model for describing the relationship between retail presence (dependent variable) and SED (independent variable).

After careful review of the available datasets, as expected, the retail outlet datasets contained both LSOAs with and without retail presence (FGRs, gambling and financial outlets). As a result, the data failed to meet the normality assumptions for a linear regression i.e. that residual of dependent variables should be approximately normally distributed (Pallant, 2016).

To solve this, a different regression model known as Binary logistic regression (BLR) can be applied. BLR can be used to analyse data where the outcome/dependent variable is dichotomous or binary in nature (Warner, 2008). i.e. it has only 2 possible outcomes (such as ‘yes’ or ‘no’ or ‘male’ and ’female’) while the predictor/independent variables can be continuous, categorical or dummy. Therefore, the outlet datasets are categorised into 2 outcomes ‘present’ and ‘absent’ i.e. LSOAs with no retail presence are recoded to ‘absent’

and those with retail presence are recoded as ‘present’ and represented by 0 and 1 respectively. This helps to uncover the effect of neighbourhood deprivation on presence or absence of FGRs and AASRs. A major strength of the BLR is the very strict assumptions of the parametric models do not apply and hence is a very robust method (Hair et al., 2018).

To use a BLR, the data has to meet the following assumptions:

a) Outcome variable is dichotomous (Wright, 1995; Hair et al., 2018). The responses of the dependent variable need to be binary as the name implies (i.e. yes or no, present or absent, male or female etc.).

b) One or more predictor or independent variables which can either be continuous, ordinal or nominal (Hair et al., 2018).

c) Outcome variable measurements must be statistically independent of one another (Wright, 1995). That is, the measurement of the variable should not originate from a repeated process.

d) The model must include all relevant predictors (Wright, 1995).

e) The different categories of the outcome variables must be mutually exclusive (Wright, 1995) and the members of each group only belong to one group, not both.

For instance, in the case of this research, each LSOA must belong to just one group i.e. presence or absence.

94 The validity of a BLR is greatly impacted by the sample size (Pallant, 2016; Hair et al., 2018). Thus, the method requires very large sample sizes. In addition, BLR is also susceptible to multi-collinearity issues. To assess multicollinearity, the correlation matrix needs to be examined and variables with high correlations (.80 and above) will signify multicollinearity. In addition, collinearity statistics (variance inflation factor (VIF) and tolerance values) are employed.

Tolerance Statistics – Tolerance gives a very direct measure of multi-collinearity. It quantifies the variability of a selected predictor variable that is not explained by other predictor variables i.e. how much of an independent variable is not explained by other independent variables in the model (Hair et al., 2018). The minimum threshold value for tolerance adopted for this thesis is 0.2 (Menard, 1995).

Variance Inflation Factor (VIF) - The VIF is an indicator of the strength of the linear relationship between one predictor variable and other predictor variables (Field, 2014).

Variables that show multicollinearity are not used together in the same model. Rather, they would be interchanged (Wang, 1996).