• No results found

Chapter 5 Methods

5.2 Exploratory spatial data analysis (ESDA)

5.2.1 Spatial interactions

There are two key spatial interactions to consider in spatial data: spatial dependence and spatial heterogeneity (Anselin, 2010).

5.2.1.1 Spatial dependence

Spatial dependence refers to when observations at a certain location depend on other observations at another location. Tobler’s first law of geography argues that spatial dependence is at the core of spatial analysis (Miller, 2004). This may be because of the spatial component or unit of the data itself, for example, postcodes, counties and districts, or due to the spatial dimension of socio-economic or regional activities (LeSage, 1999). For example, administrative boundaries may not accurately reflect the nature of the underlying economic activities, or local labour or economic markets may override some spatial boundaries, causing spatial dependence (ibid.). It is a concept of closeness, implying that distance matters. Theoretically, it is argued that spatial dependence is stronger the greater the proximity between the observations, whereas it weakens with increasing distance between them – this is called distance decay (Kelejian and Piras, 2017). Spatial dependence is simultaneously used with spatial autocorrelation, which this research will apply, and it requires further specification (following sub-sections).

There are two forms of spatial dependence (Anselin and Rey, 2010). First, spatial dependence in the error term. This is a special form of non-spherical disturbance, which does not generate biased ordinary least square (OLS) estimates but alters their variance or efficiency. It is usually considered as a nuance which needs to be eliminated. Second, spatial dependence in the variable of interest. This is accounted for by including a spatial autoregressive term, i.e. a weighted sum of values of a variable at other locations; this is called a spatial lag operator or spatially lagged variable, which can be included for the dependent variable, explanatory variables and error term – further in section 5.3. Under the circumstance of spatial dependence, OLS is no longer a consistent estimator as it violates the standard assumptions of independence in a traditional regression analysis (Anselin and Griffith, 1988). Thus, alternative methods of analysis are required – i.e. ESDA and spatial econometrics.

5.2.1.2 Spatial weights matrix

To explore the spatial interactions of data, it is necessary to quantify the spatial aspects of the data. Spatial dependence is usually accounted by a ‘spatial weights matrix’, W (LeSage, 1999). This is a square matrix that defines the neighbourhood for each spatial unit; it specifies which of the other locations affect the value at the current or focal location. The spatial weights matrix describes the closeness between two spatial units in terms of a contiguity or distance measure (Getis and Aldstadt, 2004). Spatial units that are considered as neighbours will interact in a meaningful way. This could be in the form of externalities, spillovers, proximity issues, imitation policies, similarity of market or sharing of resources, etc. (Kelejian and Piras, 2017). Usually, the W is specified to be row-standardised (or row-normalised) to sum unity, which means taking the weighted average and that a spatial unit is not viewed as its own neighbour.

There are two main types of spatial weights matrices: contiguity-based and distance-based (Getis and Aldstadt, 2004). Contiguity matrices are specified by the relative position in space of one spatial unit of observations to another unit. It is a binary matrix composed of the values 0 and 1 that captures the notion of connectiveness between the spatial units. There are different ways of defining contiguity relationships between spatial units (LeSage, 1999) – see Figure 5.1:

• Rook contiguity is when a spatial unit shares a common side with other spatial units; • Bishop contiguity is when a spatial unit shares a common vertex with other spatial units;

and,

• Queen contiguity is when a spatial unit shares a common side or edge with other spatial units.

Figure 5.1 Contiguity-based spatial weights matrices

These are first-order contiguity matrices, but when they are double-rook, -bishop and -queen, it refers to second-order contiguity matrices, which includes the neighbours of the neighbour. Contiguity-based matrices are useful when dealing with regular square grids, where the spatial structure can be easily summarised in mathematical terms. However, when spatial units vary in shape, the irregular shapes can lack in precision; the simplicity of such weights matrices can be limited (LeSage, 1999; Getis and Aldstadt, 2004). Yet, this research will use the contiguity- based spatial weights matrix (see section 6.3).

Alternatively, there are distance-based matrices. These define neighbours based on distance, which is usually computed as the distance between the centroids of spatial units (LeSage, 1999). Examples include:

• Distance bandwidths as the nth nearest neighbour distance – two spatial units are

neighbours when a spatial unit falls within a critical distance band from another; • Euclidean distance, Great Circle distance, etc. – different measures of distance;

• Inverse distance (distance decay) – the inverse of the distance matrix, which implies that the further the distance, the weaker the impact between two spatial units; and, • K-nearest neighbours (can be based on inverse distance) – neighbours are defined by a

fixed number of neighbours, k.

Distance-based matrices are more flexible in defining neighbours compared to contiguity- based matrices. In the context of T&H, distance-based matrices are more relevant as the data tends to be in points as we are concerned in T&H firms, attractions or points of interest in research (e.g. Yang and Wong, 2012; Eugenio-Martin, Cazorla-Artiles and Gonzalez-Martel, 2019). However, in this research, spatial aggregation is necessary due to the statistical disclosure control and thus each point is aggregated by polygons (or shapes), i.e. spatial units (refer to section 5.6.1) – this could favour the use of contiguity-based matrices. Despite distance-based matrices are more flexible, they do suffer from limitations such as determining the type of distance (e.g. Euclidean or Great Circle distance) to be used can be arbitrary. Moreover, distance does not necessarily have to be defined as a physical distance between points or nodes; there are also economic distance between regions such as technological proximity or absorptive capacity differences (e.g. Parent and LeSage, 2008; Harris and Kravtsova, 2009). This research will use distance-based matrices to check for the robustness of the choice of spatial weights matrix (see section 6.3).

Once the spatial weights matrix is specified, the spatial lag operator can be defined. This takes into account of the spatial dependence via the spatial weights matrix to a chosen observation and produces a spatially weighted average of the neighbouring observations, which can also show the spatial dependence of the chosen observation between spatial units. These are referred to as spatial spillovers (Harris and Kravtsova, 2009). It is important to acknowledge the sensitivity of the spatial weights matrix on spatial model estimations and inferences (Arbia and Fingleton, 2008). Harris and Kravtsova (2009) have argued that an incorrect specification of the spatial weights matrix can lead to incorrect conclusions and inferences in spatial modelling. However, LeSage and Pace (2014) have addressed the myth in spatial econometrics that there is little theoretical basis for the argument regarding the sensitivity of spatial estimations to the specification of the spatial weights matrix. The authors argued that past studies have created this myth due to incorrect interpretations of model coefficients and mis-specified models, which was explained by the choice of spatial weights matrix. Yet, the authors have empirically found that if the estimates and inferences are based on the true partial derivatives of the model, then the myth can be rejected. Regarding the interpretation and the true partial derivatives, section 5.5.1 will discuss further in the context of the model specification of this research. Above all, it is important for researchers to be aware of this matter when examining and interpreting spatial models.

5.2.1.3 Spatial heterogeneity

Spatial heterogeneity refers to the variation in relationships over space, thus exploring the spatial structure (Anselin, 2003). This implies that for every observation or relationship between observations in space may be different from each other, for example, unstable economic variables across space and the co-existence of diverse spatial patterns (LeSage, 1999). Spatial positions can be quantified and there are different ways to model variation over space – e.g. spatial heteroscedasticity (non-constant error variances) and model coefficients. Statistics such as the local indicators of spatial association or the G statistics can be used to test for spatial heterogeneity (Getis and Ord, 1992; Anselin, 1995). A locally weighted regression, such as geographically weighted regression models, can also be used, which will produce estimates for every point in space based on a local level. In this research, a local model (section 5.5.2) will be estimated to observe the spatial relationship between spatial clustering and agglomeration economies and T&H labour productivity and its variation across space. It is important to acknowledge the existence of the inverse problem of separate spatial heterogeneity

from spatial dependence (Anselin, 2010). The cross-sectional dimension of spatial data allows the identification of spatial patterns and clusters but does not provide sufficient information to explain the underlying process that causes such patterns. In practice, there is further complexity as each form of misspecification may suggest the other form in diagnostic tests; for instance, tests against heteroscedasticity goes against spatial autocorrelation and vice a versa (Anselin and Griffith, 1988).

To detect these spatial interactions, ESDA is conducted and this provides measures of both global and local spatial autocorrelation, which are essential to establish spatial patterns and inequalities.