Spatial Data Structures - Handling Data in R

2.4 Handling Data in R

2.4.3 Spatial Data Structures

There are a number of spatial data structures in contributed R packages, but among the most useful for our purposes are those in the sp package. We will present a highly simplified description of these data structures. A very complete description is given in Chapter 2 of Bivand et al. (2008). The two most common data models in GIS theory are the vector model and the raster model (Lo and Yeung, 2007). Familiarity with these models will be assumed. If you are not familiar with them, you should review an introductory GIS book. Several are listed in Section 2.8. Within the vector model, the three most common data types are point data, line data, and polygon data. Although line data are very useful in some applications, we will have no call in this text to use them, so we will focus on point and polygon data. The only point data structure used in this book is the SpatialPointsDataFrame, so we begin with that.

All sp data structures are “new style” R classes (Venables and Ripley, 2000, p. 99), which are called S4 classes. The “old style” classes are called S3 classes. The data fame, for example, is an S3 class. An object in an S4 class is composed of slots, which function somewhat like the components of a list as described in Section 2.4.1. Each slot contains some piece of data describing a part of the total data structure. The easiest way to introduce the SpatialPointsDataFrame is to construct one. We will begin with the data frame data.Set1.obs that we just saw in Section 2.4.1. It contains data associated with yellow-billed cuckoo observations in Data Set 1. If you have set up your directory struc-

ture as described in Section 2.1.2, this file would be located in the file C:\rdata\SDA\

data\set1\set1\obspts.csv. If you have executed the statement setwd(“c:\\rdata\\SDA\\

data”), then to load the data into R as you would execute the following statement, taken

from the file Appendix\B.1 Set 1 input.r.

> data.Set1.obs <- read.csv(“set1\\obspts.csv”, header = TRUE)

Now that the data frame is loaded, let us look again at its class and structure.

> class(data.Set1.obs)

[1] “data.frame”

> str(data.Set1.obs)

‘data.frame’: 21 obs. of 5 variables:

$ ID : int 1 2 3 4 5 6 7 8 9 10 … $ PresAbs : int 1 0 0 1 1 0 0 0 0 0 … $ Abund : int 16 0 0 2 2 0 0 0 0 0 …

$ Easting : num 577355 578239 578056 579635 581761 … $ Northing: num 4418523 4418423 4417806 4414752 4415052 …

This is converted into a SpatialPointsDataFrame using the function coordinates() of the sp package. This function establishes the spatial coordinates of the data and does the class conversion. If you have not installed sp on your computer, you need to do so now.

If you the missed instructions the first time, go back to the end of Section 2.1.1 to learn how to do this. Once the package is installed on your computer, you load it into R for that session by using the function library().

> library(sp)

> coordinates(data.Set1.obs) <- c(“Easting”, “Northing”)

Experienced programmers may find the second statement a bit odd in that an assignment is made to a function rather than a function assigning a value to a variable. This is an example of a replacement function, which is described in The R Language Definition under Manuals in the R Project website, http://www.r-project.org/. If you do not see what the issue is, or if this is too much information, do not worry about it.

Let us see what has happened to the object data.Set1.obs.

> str(data.Set1.obs)

Formal class ‘SpatialPointsDataFrame’ [package “sp”] with 5 slots

..@ data :‘data.frame’: 21 obs. of 3 variables: .. ..$ ID : int [1:21] 1 2 3 4 5 6 7 8 9 10 … .. ..$ PresAbs: int [1:21] 1 0 0 1 1 0 0 0 0 0 … .. ..$ Abund : int [1:21] 16 0 0 2 2 0 0 0 0 0 … ..@ coords.nrs : int [1:2] 4 5 ..@ coords : num [1:21, 1:2] 577355 578239 578056 579635 581761 … .. ..- attr(*, “dimnames”)=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:2] “Easting” “Northing” ..@ bbox : num [1:2, 1:2] 577355 4398881 588102 4418523 .. ..- attr(*, “dimnames”)=List of 2 .. .. ..$ : chr [1:2] “Easting” “Northing” .. .. ..$ : chr [1:2] “min” “max”

..@ proj4string:Formal class ‘CRS’ [package “sp”] with 1 slots .. .. ..@ projargs: chr NA

The object has been converted into a SpatialPointsDataFrame. It has five slots (i.e., places where data are stored), data, coords.nrs, coords, bbox, and proj4string. The slot data contains the attribute data associated with each data record. The slot coords. nrs contains the column numbers of the coordinates in the original data set. The slot coords contains the coordinates of each data record. The slot bbox contains the infor- mation about the bounding box, the rectangular region in geographic space that contains all the data locations. This is useful for drawing maps involving the data. The slot pro- j4string is used to house information about the map projection. Since none has been specified yet, this slot holds the symbol NA. Projections are handled in the package sp by using the PROJ.4 Cartographic Projections library originally written by Gerald Evenden (http://trac.osgeo.org/proj/). To establish the projection or coordinate system, we use the replacement function proj4string().

> proj4string(data.Set1.obs) <- CRS(“+proj=utm +zone=10 +ellps=WGS84”)

The function CRS() provides the coordinate reference system. The projection is specified in the argument, and therefore must be known to the investigator.

Many of the slots of a SpatialPointsDataFrame such as data.Set1.obs can be accessed via an extractor function. Sometimes these have the same form as the

corresponding replacement function. For example, the function coordinates() is used to extract the coords slot.

> coordinates(data.Set1.obs) Easting Northing [1,] 577355.0 4418523 [2,] 578239.5 4418423 * * DELETEED * _* _* [21,] 588102.4 4398881

If no specific extractor function is available, there is a general extractor function slot(). The slots themselves are objects that are members of some class.

> class(slot(data.Set1.obs, “data”) ) [1] “data.frame”

To display the first 10 elements of the Abund data field, we can type

> slot(data.Set1.obs, “data”)$Abund[1:10] [1] 16 0 0 2 2 0 0 0 0 0

Since slot(data.Set1.obs, “data”) is a data frame, the $ operator may be appended to it to specify a data field, and then square brackets may be appended to that to access particular data records. A shorthand expression of the function slot() as used earlier is the @ operator, as in the following:

> data.Set1.obs@data$Abund[1:10] [1] 16 0 0 2 2 0 0 0 0 0

In many cases it is possible to work with a SpatialPointsDataFrame as if it were an ordinary data frame. For example, the aforementioned statement can be replaced with this one.

> data.Set1.obs$Abund[1:10]

[1] 16 0 0 2 2 0 0 0 0 0

If this works, it is the best procedure to use. Occasionally it does not work, however, and if it does not, then we will use either the function slot() or occasionally, if it makes for more legible code, the @ operator.

The polygon is a more complex entity than the point, and the data structures associated with polygons are accordingly more complex. We will be dealing with two structures, the SpatialPolygons object and the SpatialPolygonsDataFrame object. A SpatialPolygonsDataFrame is a SpatialPolygons object with attribute data attached. To introduce these, we will create a SpatialPolygonsDataFrame describing the boundary of Field 2 of Data Set 4. This is the field shown in Figures 1.2 and 1.3.

The creation of a polygon describing the boundary of an area to be sampled requires a set of geographic coordinates describing that boundary. These can be obtained by walking around the site with a global positioning system (GPS) or, possibly, by obtaining coordinates from a georeferenced remotely sensed image, either a privately obtained image or a publicly available one from a database such as Google Earth or Terraserver. If the boundary is not defined very precisely, one can use the function locator() (see Section 7.3) to

obtain the coordinates quite quickly from a map or image. In the case of Field 2, the coordinates were originally obtained using a GPS and modified slightly for use in this book. They are set to an even multiple of five and so that the length of the field in the east–west direction is twice that in the north–south direction.

> N <- 4267873 > S <- 4267483 > E <- 592860 > W <- 592080 > N - S [1] 390 > E - W [1] 780

The following procedure, as well as that presented in Section 5.4.5, is based on the one given by Roger Bivand in the R-sig-geo mailing list and is used by permission. We first put the boundary values into a matrix coords.mat that holds the coordinates describing the polygon. By default, matrices in R are created by columns, so that in this case the first four members of the sequence form the first column and the second four members form the last column. Because the region is rectangular, the coordinates of its corners are specified by four numbers. In the creation of a polygon with n vertices, a total of n + 1 coordinate pairs are assigned, with the first coordinate pair being repeated as the last. Thus, in this case, five vertices are specified.

> print(coords.mat <- matrix(c(W,E,E,W,W,N,N,S,S,N), + ncol = 2) ) [,1] [,2] [1,] 592080 4267873 [2,] 592860 4267873 [3,] 592860 4267483 [4,] 592080 4267483 [5,] 592080 4267873

After creating the coordinate matrix, the function Polygons() from the maptools package (Lewin-Koh and Bivand, 2011) is used to create a vector bdry.vec, each of whose elements is a list (see Section 2.4.1). Since there is only one polygon, the list has only one element, as signified by the argument length = 1. First we create an empty list. Once again, after you have installed the maptools package on your computer (Section 2.1), you must load it for the current session.

> library(maptools)

> bdry.vec <- vector(mode=“list”, length=1)

The next step is to place in this component the polygon that forms the field boundary.

> bdry.vec[ [1] ] <- Polygons(list(Polygon(coords.mat) ), ID=“1”)

The function Polygon() creates a polygon whose coordinates are those of coords.mat; the function list() is applied to the value of this function to create a list of Polygon objects; and the function Polygons() is applied to this list to create the first and only component of bdry.vec. This is the required format for the first argument of the function

Once again we use the function proj4string() to establish the map projection. Note that the name of the function proj4string() is used as an argument, and that within the function, the role of the assignment operator is played by an equals sign rather than a <-.

> bdry.sp <- SpatialPolygons(bdry.vec,

+ proj4string = CRS(“+proj=utm +zone=10 +ellps=WGS84”) )

The SpatialPolygons object bdry.sp can be augmented with attribute data to create a SpatialPolygonsDataFrame object bdry.spdf.

> bdry.spdf <- SpatialPolygonsDataFrame(bdry.sp, + data = data.frame(ID =“1”, row.names = “1”) )

Field 1 of Data Set 4 is a bit more complex in that it is not a rectangle. In Exercise 2.11, you are asked to create boundary files describing this field as it was farmed in 1995–1997.

Let us look at the structure of the two objects we just created. First we look at the SpatialPolygons object bdry.sp.

> str(bdry.sp)

Formal class ‘SpatialPolygons’ [package “sp”] with 4 slots ..@ polygons :List of 1

.. ..$ :Formal class ‘Polygons’ [package “sp”] with 5 slots .. .. .. ..@ Polygons :List of 1

.. .. .. .. ..$ :Formal class ‘Polygon’ [package “sp”] with 5 slots .. .. .. .. .. .. ..@ labpt : num [1:2] 592470 4267678

.. .. .. .. .. .. ..@ area : num 304200 .. .. .. .. .. .. ..@ hole : logi FALSE .. .. .. .. .. .. ..@ ringDir: int 1 .. .. .. .. .. .. ..@ coords : num [1:5, 1:2] 592080 592860 592860 592080 592080 … .. .. .. ..@ plotOrder: int 1 .. .. .. ..@ labpt : num [1:2] 592470 4267678 .. .. .. ..@ ID : chr “1” .. .. .. ..@ area : num 304200 ..@ plotOrder : int 1 ..@ bbox : num [1:2, 1:2] 592080 4267483 592860 4267873 .. ..- attr(*, “dimnames”)=List of 2 .. .. ..$ : chr [1:2] “x” “y” .. .. ..$ : chr [1:2] “min” “max”

..@ proj4string:Formal class ‘CRS’ [package “sp”] with 1 slots .. .. ..@ projargs: chr “+proj=utm +zone=10 +ellps=WGS84”

A SpatialPolygons object has four slots. Two of these, bbox and proj4string, are identical to the slots of the same name in the SpatialPointsDataFrame object. The slot plotOrder gives the order in which each of the polygons is plotted. The slot polygons is a list of Polygon objects. Each Polygon object contains information that describes the geometry of the polygon it represents. For details about this information, see Chapter 2 of Bivand et al. (2008).

The SpatialPolygonsDataFrame object bdry.spdf simply adds attribute data.

> str(bdry.spdf)

Formal class ‘SpatialPolygonsDataFrame’ [package “sp”] with 5 slots

..@ data :‘data.frame’: 1 obs. of 1 variable: .. ..$ ID: Factor w/ 1 level “1”: 1

..@ polygons :List of 1

.. ..$ :Formal class ‘Polygons’ [package “sp”] with 5 slots .. .. .. ..@ Polygons :List of 1

* * * DELETED * * *

We will use functions in the maptools package for input and output of sp objects. These objects are stored on disk in ESRI shapefile format (ESRI, 1998). This is a commonly used, Open GIS format. The format makes use of multiple files, of which the only ones used by functions in maptools are those with extensions shx, sbx, and dbf. Thus, for example, pro- jection information is not communicated between R and the disk. Therefore, we must use the function proj4string() to establish the projection every time we read a shapefile. If the working directory is specified, to save the polygon bdry.spdf as a shapefile, type the following:

> writePolyShape(bdry.spdf,“created\\Set42bdry”)

As this book is written, putting a “.” in the file name (i.e., Set4.2bdry) will not work correctly. The input function for SpatialPolygonsDataFrame objects is readShape-

Poly(). For example, the land use data used to create the map of the study area in Figure

1.4 is contained in the shapefile landcover.shp in the Set1 subdirectory of the data directory (Appendix B.1). If the working directory has been set to the data directory, then to input the shapefile one types the following:

> library(maptools)

> data.Set1 .cover <- readShapePoly(“set1\\landcover.shp”)

You do not have to include the extension .shp if you do not want to. The names of the two functions are reversed (“read Shape Poly” and “write Poly Shape”), and follow the order of R file and shapefile, that is, one writes polygons to a shapefile and reads a shapefile to polygons. Similar functions exist for SpatialLinesDataFrame and SpatialPointsDataFrame objects.

On many occasions we will have to convert from one coordinate system to another, gen- erally our conversions will be between UTM coordinates and WGS84 longitude and latitude coordinates (Lo and Yeung, 2007, p. 40). To convert the data in bdry.spdf from UTM to WGS84 longitude and latitude, one must create a new object in the transformed projection. This is done using the function spTransform() from the rgdal package (Keitt et al., 2011) to create a new SpatialPolygonsDataFrame with longitude and latitude coordinates.

> library(rgdal)

> bdry.wgs <- spTransform(bdry.spdf, CRS(“+proj=longlat + +datum=WGS84”) )

In document Spatial.data.Analysis.in.Ecology.and.Agriculture.using.R.by.Richard.E..Plant (Page 49-55)