• No results found

BOX 3.2 Raster data compaction techniques

(c) Chain encoding

Entity model Cell values File structure

1 2 3 4 5 6 7 8 9 10 2

3 4 5 6 7 8 9 10

Block size 1

4 9

No.

7

2 1

Cell co-ordinates

4,2 8,2 4,3 6,5 6,6 6,7 7,7 8,3 7,5 5,2

(b)

(b) Block encoding

1 0 , 1 0 , 1 0 , 1 0

0 , 3 1 , 5 0 , 2 0 , 3 1 , 6 0 , 1 0 , 4 1 , 5 0 , 1 0 , 5 1 , 3 0 , 2 0 , 5 1 , 3 0 , 2 0 , 5 1 , 2 0 , 3 0 , 1 0

0 , 1 0 0 , 1 0 (a)

(a) Run length encoding

Figure 3.13 Raster data compaction techniques

BOX 3.2 Raster data compaction techniques



1 Run length encoding. This technique reduces data volume on a row by row basis. It stores a single value where there are a number of cells of a given type in a group, rather than storing a value for each individual cell. Figure 3.13a shows a run length encoded version of the forest in Happy Valley. The first line in the file represents the dimensions of the matrix (10 × 10) and number of entities present (1). In the second and sub-sequent lines of the file, the first number in the pair (either 1 or 0 in this example) indicates the presence or absence of the forest. The second number indicates the number of cells referenced by the first. Therefore, the first pair of numbers at the start of the second line tell us that no entity is present in the first 10 cells of the first row of the image.

2 Block coding. This approach extends the run length encoding idea to two dimensions by using a series of square blocks to store data. Figure 3.13b shows how the simple raster map of the Happy Valley forest has been subdivided into a series of hierarchi-cal square blocks. Ten data blocks are required to store data about the forest image. These are seven unit cells, two four-cell squares and one nine-cell square. Co-ordinates are required to locate the blocks in the raster matrix. In the example, the top left-hand cell in a block is used as the locational ref-erence for the block.

3 Chain coding. The chain coding method of data reduction works by defining the boundary of the entity. The boundary is defined as a sequence of unit cells starting from and returning to a given origin.

The direction of travel around the boundary is usually given using a numbering system (for example 0 = North, 1 = East, 2 = South and 3 = West). Figure 3.13c shows how the boundary cells for the Happy Valley forest would be coded using this method. Here, the directions are given letters (N, S, E and W) to avoid misunderstanding. The first line in the file structure tells us that the chain coding started at cell 4,3 and there is only one chain. On the second line the first letter in each sequence represents the direction and the number of cells lying in this direction.

4 Quadtrees. One of the advantages of the raster data model is that each cell can be subdivided into smaller cells of the same shape and orientation (Peuquet, 1990). This unique feature of the raster data model has produced a range of innovative data storage and data reduction methods that are based on regularly subdividing space. The most popular of these is the area or region quadtree. The area

quadtree works on the principle of recursively sub-dividing the cells in a raster image into quads (or quarters). The subdivision process continues until each cell in the image can be classed as having the spatial entity either present or absent within the bounds of its geographical domain. The number of subdivisions required to represent an entity will be a trade-off between the complexity of the feature and the dimensions of the smallest grid cell. The quadtree principle is illustrated in Figure 3.14.

BOX 3.2

Figure 3.14 The quadtree

Figure 3.15a shows such a vector data structure for the Happy Valley car park. Note how a closed ring of co-ordinate pairs defines the boundary of the polygon.

The limitations of simple vector data structures start to emerge when more complex spatial entities are considered. For example, consider the Happy Valley car park divided into different parking zones (Figure 3.15b). The car park consists of a number of adjacent polygons. If the simple data structure, illus-trated in Figure 3.15a, were used to capture this entity then the boundary line shared between adja-cent polygons would be stored twice. This may not appear too much of a problem in the case of this example, but consider the implications for a map of the 50 states in the USA. The amount of duplicate data would be considerable. This method can be

improved by adjacent polygons sharing common co-ordinate pairs (points). To do this all points in the data structure must be numbered sequentially and contain an explicit reference which records which points are associated with which polygon. This is known as a point dictionary (Burrough, 1986). The data structure in Figure 3.15b shows how such an approach has been used to store data for the differ-ent zones in the Happy Valley car park.

The Happy Valley road network illustrates a slightly different problem. The simple vector data structure illustrated in Figure 3.15a could be used to graphically reproduce the network without any duplication of data. However, it would not contain any information about the linkage between lines.

Linkages would be implied only when the lines are displayed on the computer screen. In the same way, a series of polygons created using either the simple data structure described in Figure 3.15a or a point dictionary approach (Figure 3.15b) may appear con-nected on the screen when in fact the computer sees them as discrete entities, unaware of the presence of neighbouring polygons.

A further problem of area features is the island or hole situation. In Figure 3.16 a diamond-shaped area has been placed in the centre of the Happy Valley car park to represent the information kiosk. This feature is contained wholly within the polygons classified as the car park. Whilst a simple vector file structure would recreate the image of the car park, it would not be able to inform the computer that the island polygon was ‘contained’ within the larger car park polygon.

For the representation of line networks, and adjacent and island polygons, a set of instructions is required which informs the computer where one polygon, or line, is with respect to its neigh-bours. Topological data structures contain this information. There are numerous ways of provid-ing topological structure in a form that the computer can understand. The examples below have been selected to illustrate the basic principles underpinning topological data structures rather than describe the structures found in any one GIS environment.

Topology is concerned with connectivity between entities and not their physical shape. A useful way to help understand this idea is to visualize individual

Figure 3.15 Data structures in the vector world:

(a) simple data structure; (b) point dictionary 20,0

lines as pieces of spaghetti. Consider how you would create a model of the Happy Valley car park (Figure 3.16) using strands of spaghetti on a dinner plate. It

is most likely that you would lay the various strands so that they overlapped. This would give the appear-ance of connectivity, at least whilst the spaghetti

105

End-node Left-poly Right-poly Length 1

Figure 3.16 Topological structuring of complex areas

remained on the plate. If you dropped the plate it would be difficult to rebuild the image from the spaghetti that fell on the floor. Now imagine that the pieces of spaghetti are made of string rather than pasta and can be tied together to keep them in place.

Now when the plate is dropped, the connectivity is maintained and it would be easier to reconstruct the model. No matter how you bend, stretch and twist the string, unless you break or cut one of the pieces, you cannot destroy the topology. It is this situation that computer programmers are striving to mirror when they create topological data structures for the storage of vector data. The challenge is to maintain topology with the minimum of data to minimize data volumes and processing requirements.

A point is the simplest spatial entity that can be represented in the vector world with topology. All a point requires for it to be topologically correct is a pointer, or geographical reference, which locates it with respect to other spatial entities. In order for a line entity to have topology it must consist of an ordered set of points (known as an arc, segment or chain) with defined start and end points (nodes).

Knowledge of the start and end points gives line direction. For the creation of topologically correct

area entities, data about the points and lines used in its construction, and how these connect to define the boundary in relation to adjacent areas, are required.

There is a considerable range of topological data structures in use by GIS. All the structures available try to ensure that:

 no node or line segment is duplicated;

 line segments and nodes can be referenced to more than one polygon;

 all polygons have unique identifiers; and

 island and hole polygons can be adequately represented.

Figure 3.16 shows one possible topological data structure for the vector representation of the Happy Valley car park. The creation of this structure for complex area features is carried out in a series of stages. Burrough (1986) identifies these stages as identifying a boundary network of arcs (the enve-lope polygon), checking polygons for closure and linking arcs into polygons. The area of polygons can then be calculated and unique identification num-bers attached. This identifier would allow non-spatial information to be linked to a specific polygon.

The UK Ordnance Survey’s MasterMap is an example of a topologically structured data set in which each feature has a unique ID. It was developed in response to an increasing demand for spatially referenced topographic information, and for better structured data for use in GIS.

MasterMap is an object-oriented data set (see Chapter 4). It provides a series of data layers, each layer containing features representing real-world objects such as buildings, roads and postboxes. Each feature has a unique 16 digit identifier called a Topographic Identifier or TOID®. An example allo-cated to a building might be 0001100028007854. This TOID®remains stable throughout the life of a feature, so if a change is made, say the building is extended, the TOID®stays the same.

There are over 430 million features in the MasterMap database and 5000 updates are made every day. The data includes a layer with all 26 million postal addresses in England, Wales and Scotland referenced to less than 1 m. The MasterMap data have been widely adopted and used successfully in a range of projects.

The special features of MasterMap are:

 Layers of data provide a seamless topographic database for the UK at scales of 1:1250 and 1:2500.

 The data are made up of uniquely referenced fea-tures created as points, lines or polygons.

 Each feature has a unique 16 digit topographic identifier called a TOID®.

 MasterMap data can be supplied in a topo-logically structured format.