Spatial Data Representation, Analysis and Geographical Information Systems

4. Methodology for the Spatial Analysis of Intra-Urban Structure and Transport

4.2 Spatial Data Representation, Analysis and Geographical Information Systems

Before looking at specific urban datasets and methods of analysis, we firstly consider fundamental issues of spatial data representation. These relate to choices of data models and scale, which in turn influence the types of spatial analysis that can be performed. Recent improvements in spatial data sources have increased data resolutions and brought greater flexibility in representation.

These developments are an essential advance in allowing the intra-metropolitan scale of analysis undertaken in this research. The development of GIS

technology has moved spatial data representation beyond paper based maps to the rapid processing and analysis of spatial data, discussed in Section 4.1.5.

Map based analysis can be complemented with statistical methods (Section 4.1.6).

4.2.1 Spatial Data Models

The development of information and knowledge relies on processes of reduction and abstraction to manage complexity. Scientific knowledge is developed and tested through models, which are abstractions of reality that mediate between theory and the real world (Morgan and Morrison, 1999). Modelling was discussed earlier in relation to systems theory, and further connected meanings of modelling include the definition of entities (ontologies) and the structure of data representation (data models). In the context of geographical disciplines, spatial representation and data modelling are central to the field.

Urban spatial data models can be usefully divided into iconic and symbolic models (Batty, 2001a). Iconic models represent geometric features that correspond to real world physical objects, such as are found in topographic mapping. Symbolic models on the other hand represent abstract spatial features, such as social and economic attributes, as is common in thematic mapping. The majority of urban geographical analysis is based on symbolic representations, including land use transportation models which use zonal-based flows and interactions. The forms of spatial representation are connected to issues of scale,

levels of detail and computational overheads. There are trade-offs in terms of the functionality and simplicity of different representations.

Following the digital revolution and development of Geographical Information Systems, two core digital spatial data models have been developed based on cartographic traditions: the vector and raster models. The vector spatial data model uses point, line and polygon structures- in mathematical terms geometric primitives (Raper, 2000)- to represent spatially discrete entities with linked aspatial attribute information commonly known as features. The connected attributes are stored within a relational database, the creation of which is itself a representational and data modelling process of entity creation and relation definition. The second fundamental data model structure is the raster model, which employs a regular grid tessellation of values (Raper, 2000). Vector models are used for iconic built-environment representations, where buildings are modelled as discrete objects, and for socio-economic zonal data, where zones are the discrete objects. Raster data models are used for continuous data such as elevation and remotely sensed imagery. The choice of data model has a number of important implications relating to the range of analytical processes which can be undertaken (Goodchild, 2005).

4.2.2 Scale in Geographical Analysis

Scale is a central concern of geographical research, both for theoretical and technical reasons. In theoretical terms scale dependence is an inherent feature of complex systems such as cities. Consequently studies must be carried out at the appropriate scale of analysis relating to the phenomena of interest (Openshaw, 1996), and ideally the interactions between processes at different scales should be understood. This typically requires analysis and testing at several scales to consider inter-scale relationships (Fotheringham and Rogerson, 1993). This research focuses on a meso-scale urban analysis to provide a city-region focus and complement the existing body of sustainability research at micro and macro scales (see Section 3.3).

The ability to perform analysis at any particular scale is dependent on the data

and level of detail (Longley et al., 2005). Scale describes the scope of the data representation, in terms of which features will be included and excluded, and the detail of those features that are included. The translation of real world entities into geometrical features inherently involves abstraction, guided by the chosen scale. For iconic spatial data, the complex geometry of real world objects must be simplified through processes of generalisation. Symbolic data representations are similarly affected by data manageability issues, with the additional factor of privacy considerations for socio-economic data.

Consequently aggregate zonal data is the most common output format for socio-economic spatial data (Section 4.3).

There is a long established association between extent and level of detail in spatial data which expresses a fundamental trade-off in geographical research (Talen, 2003). Studies that cover a large spatial extent generally compromise their ability to include fine-scale features and processes. Conversely studies that focus on fine-scale processes face significant methodological and computational challenges in „scaling up‟ such research to cover large geographical extents.

This balance is significant both for spatial analysis, where large high-detail datasets increase methodological complexity and computation demands; and for visualisation, where there is a limited information density that can be legibly visualised on a page or screen (Skupin, 2000). The technical aspects of the scale trade-off in geographical research are increasingly being overcome, as

innovations in fine-scale spatial data are opening up new possibilities for empirical urban spatial analysis. These new datasets are sufficiently intensive to analyse detailed form and function relationships and also sufficiently extensive to enable patterns to be generalised across entire city-regions (Batty, 2007a), and thus underlie the intra-metropolitan analysis of this research. These technical advances do not however solve the methodological complexity in combining intensive and extensive studies, and many analytical and visualisation challenges remain, as discussed in the following sub-sections.

4.2.3 Zonal Systems and Aggregation

Zones are the basic analysis units in much urban geographical research. The choice of zonal system or zonation has a series of consequences for the scale of processes that are represented by the data, and the computational demands on analysing that data. Highly disaggregate analysis is able to capture micro-level processes, but leads to increased computational demands and can be

problematic for visual legibility and privacy. In city-wide studies, highly disaggregate visualisation and analysis can be cumbersome, with millions of units for analysis. Therefore aggregation methods are an important tool for generalising patterns and simplifying analysis.

A key reason why zonal systems must be scrutinised is the very common source of error known as the Modifiable Areal Unit Problem (MAUP) (Openshaw, 1984), which describes how changes in the spatial boundaries of a zonal system can alter the aggregate statistical properties of that system. The gerrymandering of political boundaries to influence election results is a classic illustration of this phenomenon. There is a second related aspect of the MAUP described as the scale effect, where the results of spatial statistical analysis change depending of the level of resolution, as a direct consequence of the scale dependence of geographical phenomena. In socio-economic contexts scale dependence is connected to the ecological fallacy, where it is statistically invalid to assume that aggregate properties of a zone apply to an individual within that zone. The MAUP affects all zonal data and is exacerbated by the fact that zonal

boundaries are often arbitrary or fixed for reasons which are incidental to the purpose of study (Openshaw, 1996). Detailed spatial data is a means of minimising MAUP effects as discussed below.

Example zonations in urban geographical analysis as illustrated in Figure 4.1.

Socio-economic zonations are very common in urban data, produced for administrative purposes. It is also possible to create zonal systems from built-environment features, such as street blocks. These are relevant to local urban planning tasks, and are more problematic to apply at higher level geographies.

Finally abstract zonal systems without reference to any spatial features are

advantageous for statistical analysis, as can the equal area properties of regular grids. Note that socio-economic zones generally sacrifice areal regularity in favour of the regularity of population variables between zones. Essentially the choice of zonation should follow from the desired scale of analysis and the spatial correspondence with the phenomenon of interest.

Micro-Scale Meso-Scale Macro-Scale

Fine Scale Grid Meso-Scale Grid

Macro-Scale

Travel Data Individual travel

diary-survey Zonal Interaction Matrices at varied socio-economic zone scales shown above

Figure 4.1: Aggregation Methods for Varied Scales and Zonations of Urban Spatial Analysis.

Processes of aggregation involve transforming spatial data between zonations.

Transformations from detailed disaggregate spatial data to coarser resolution zone systems are the most straightforward and statistically reliable to perform.

Subsequently data at fine spatial scales is advantageous for scale flexibility.

Another common task is to perform zonal transformations between data at similar resolutions. While spatial analysis techniques exist for such processes, MAUP errors will be introduced to a greater or lesser degree. Disaggregating from coarser resolutions to finer resolutions, is statistically highly unreliable and is the basis of the ecological fallacy. There are techniques from modelling to address this problem by simulating populations using micro-survey data, but in standard spatial analysis, disaggregation transformations should be avoided.

4.2.4 Mapping and Visualisation

In the context of this research we employ thematic mapping techniques, that is visualisation methods that portray spatial variation, patterns and

inter-relationships amongst spatial variables (Raper, 2000). Basic thematic maps display the spatial distribution of a single variable, and the visualisation challenge in urban research is often how multiple variables and relationships can be legibly visualised. One approach is to mathematically combine spatial variables into single composite variables (as discussed in the next section). An alternative visualisation techniques is three-dimensional mapping, where the extra dimension provides a means of combining multiple data layers and expanding the information content of the map. By extruding features in the third dimension, volume can be used as an intuitive means of displaying magnitude.

Three-dimensional visualisation methods are used in Chapter 5 of this research to map urban density and function. Another important urban visualisation challenge is the mapping of flows, where each data item has an origin,

destination, magnitude and potentially further properties. In Chapter 6 a series of techniques are employed in mapping journey-to-work data to summarise complex travel distributions.

Design decisions in thematic mapping affect the prominence of features, and influence how the map is „read‟ by audiences (Monmonier, 1991). For scientific applications, the concern with mapping techniques is that design decisions can influence map interpretation and be used as a rhetorical device. A particularly

classification. Variables with a large number of values are typically grouped into classes of similar value to simplify visual interpretation. Two algorithms used to determine the numerical intervals between classes are illustrated in Figures 4.2 and 4.3. In the example the Jenks Natural Breaks algorithm (Figure 4.2) emphasises differences in the middle range of the distribution, whilst the Equal Interval algorithm (Figure 4.3) focuses on the extreme values. The classification legend must therefore be made clear. It is beneficial to combine mapping with tables and statistical analysis to provide measures of spatial pattern independent of cartographic design.

Figures 4.2 & 4.3: London Population 2001 using (left) Jenks Natural Breaks and (right) Equal Interval classification algorithms. Data source: Census 2001 (ONS, 2010a).

4.2.5 Geographical Information Systems and Spatial Analysis Geographical Information System (GIS) technologies provide a range of functionality relevant to this research, including the ability to handle very large datasets, to integrate multiple data layers into composite indicators, and to combine varied forms of spatial analysis including topographic, topological and attribute based functions. GIS technology has revolutionised how geographical information is stored, analysed, and visualised (Longley et al., 2005).

Increasingly flexible and interactive means of using geographical information have evolved. We focus the discussion here on the use of GIS for urban research and planning. The core of GIS software is an integration of spatial database functionality; visualisation and cartographic design functionality; and spatial analysis functionality. The synergies between these tasks underlie the success of GIS as a software platform.

The core analysis functionality within GIS is based on manipulating and combining spatial data layers, with spatial location as the key means of integration. These processes of cartographic modelling involve overlaying layers, and performing arithmetic and logical functions either on individual layers or in combination (Tomlin, 1990). Spatial analysis can also be based on geometric properties, in terms of lengths, areas and distances between discrete features, and topological relationships between features. Finally aspatial

database operations are a useful complement to spatial analysis, for the querying of the properties and classifications of the spatial features. Thus GIS

functionality involves combining locational, geometrical, topological and attribute based analysis. This range of spatial analysis functionality is useful for integrated urban analysis, with built-environment data relating to geometrical analysis, socio-economic data to the attributes of spatial zones, and accessibility analysis to topological relationships.

While a range of GIS spatial analysis functionality is available in mainstream software, this does not typically include the more advanced spatial statistics and spatial modelling tasks that are common in fields such as environmental and land use transport modelling. Some researchers have criticised the view that GIS technology is equivalent to spatial analysis, arguing that the power of the graphical medium creates a pseudo-realism which is not necessarily matched by the explanatory power of the spatial models (Longley and Batty, 1996). While GIS technology has much to offer spatial analysis activities in terms of data storage and visualisation tasks, researchers have sought to define the disciplines of geographical information science (GISc) (Raper, 2000) and geocomputation (Longley et al., 1998) independently from GIS.

4.2.6 Statistics and Spatial Analysis

Statistical techniques can provide a more rigorous complement to the

visualisation and GIS analysis methods described above. The two main contexts for statistical methods in spatial analysis are as a descriptive exploratory tool, where calculations are used to identify patterns and hotspots to guide analysis in

statistical methods are used to test the significance of research hypotheses.

Descriptive statistics include many common measures of distribution, such as mean and modal statistics, and measures of variance and deviation. Inferential statistics make predictions about future probabilities based on statistical samples. This includes significance testing of distributions for clustering, and correlation analysis for testing for relationships between variables, amongst many other techniques. Measures of statistical association between variables such as regression are of fundamental importance in scientific research for hypothesis testing. It must be borne in mind that regression and correlation measures can prove statistical association relationships but cannot prove causality, as discussed previously in Section 3.3 in the context of built-environment and travel pattern research. For this research thesis, statistics are used in the analysis of urban structure, principally measures of urban centrality and of function as discussed in Section 4.6, and in regression analysis to test relationships between urban form and travel patterns, as presented in Chapter 6.

Spatial statistics incorporate spatial location considerations into descriptive and inferential statistical measures. The spatial association between variables is fundamental to geographical enquiry, as succinctly expressed in Tobler‟s first

„law‟ of geography- “everything is related to everything else, but near things are more related than distant things” (Tobler, 1970). While spatial association is at the core of geographical analysis, in statistical terms it can invalidate a basic assumption of inferential statistics: the independence of the sample data, as nearby samples are likely to be correlated. This is referred to as spatial

autocorrelation. Various means of measuring spatial autocorrelation have been developed, as discussed further in Section 4.6.

4.2.7 Summary

Forms of spatial representation are connected to issues of scale, levels of detail and computational overheads. There is a long established association between extent and level of detail in geographical analysis, with studies either covering a large spatial extent at a low level of detail, or a large spatial extent at a coarse level of detail. In light of scale dependence and MAUP issues, it is necessary to perform spatial analysis at the appropriate scale(s) for the process in question.

This research focuses on a meso-scale urban analysis to provide a city-region focus and complement the existing body of sustainability research at micro and macro scales.

There are a broad range of techniques for the study of spatial relationships, including mapping, GIS based analysis, and spatial statistics. Mapping is a ubiquitous method of information exploration, communication and analysis and is best employed in combination with statistical methods. The development of GIS technology has brought a profound revolution in how geographical information is stored, managed, edited, analysed and presented. Increasingly powerful tools to manipulate large spatial datasets have evolved, along with flexible and interactive means of using geographical information.

4.3 Urban Geographical Data: Measuring the

In document Polycentricity and Sustainable Urban Form: An Intra-Urban Study of Accessibility, Employment and Travel Sustainability for the Strategic Planning of the London Region (Page 139-148)