• No results found

Tutorial: Managing Ocean Data with R

N/A
N/A
Protected

Academic year: 2021

Share "Tutorial: Managing Ocean Data with R"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Tutorial: Managing Ocean Data with R

Using data from the Ocean Networks Canada observatory at Cambridge Bay, Nunavut

 

By: Allan Roberts, Ocean Networks Canada

Version: August 15, 2013

Purpose. This tutorial has two main goals:

(i) To introduce the user to some R functions that are useful for data management.

(ii) To introduce some of the oceanographic data types collected by the Ocean Networks Canada

ocean bottom platform at Cambridge Bay, Nunavut. (See Table 1.)

Note: This tutorial assumes that the user has R installed on a computer, and is familiar with

entering basic instructions onto the R command line. R is a free software application for statistics,

graphics, and programming. Users requiring more background are referred to the manual

An

Introduction to R

that is available on the

R Project website

. For an example of customizing a data

plot, refer to the tutorial “Plot Ocean Data with R.” To search for Ocean Networks Canada Data,

go to

Data & Tools

.

Instructions. The statements in the R script provided below can be typed, or cut and paste, onto

the R command line one at a time, in sequence. Alternatively, multiple statements can be cut and

paste as a block. A semi-colon indicates the end of each statement. Section 2 of the script will

load one month of data into R. It is recommended that all of Section 2 be cut and paste onto the R

command line as a unit, and that the other instructions be typed, or cut and paste, onto the

command line one at a time.

R Script

############################################################### #Section 1: Creating Vectors and Sequences

###############################################################

#The following commands allow the user to create vectors.

c(3,4); #A vector with two elements. "c" is the concatenation operator. rep(2,5); #A 5-element vector with "2" repeated 5 times.

1:100; #A sequence from 1 to 100, by steps of 1.

seq(1, 100); #An alternative notation for a sequence from 1 to 100 by steps of 1. seq(0, 100, 2); #A sequence from 0 to 100, by steps of 2.

#Sample 10 pseudorandom numbers from 1 to 100, without replacement: sample(1:100, 5, replace = FALSE);

(2)

##################################################################################### #Section 2: Building a data frame. Cut and paste as a block onto the R command line. #####################################################################################

year = rep(2012,31); #A vector with 31 entries representing the year. month = rep(12,31); #A vector representing the month ("12" for Dec.). day.of.month = 1:31; #A sequence from 1 to 31 for the day of month. day.of.year = 336:366; #A sequence from 336 to 366 for the day of year. #A vector of daily mean ice draft values (in metres):

ice.draft = c(0.620,0.626,0.629,0.634,0.642,0.652,0.662,0.670,0.682,0.696,0.704, 0.709,0.718,0.723,0.727,0.735,0.740,0.747,0.751,0.758,0.758,0.764,0.762,0.765,0.768, 0.773,0.778,0.784,0.793,0.801,0.806);

#The following creates a vector of daily mean water temperature values:

water.temperature = c(-0.975,-0.946,-0.925,-0.899,-0.902,-0.883,-0.857,-0.929,-1.230, -1.392,-1.360,-1.341,-1.313,-1.285,-1.289,-1.311,-1.302,-1.311,-1.327,-1.331,-1.345, -1.354,-1.354,-1.375,-1.386,-1.374,-1.406,-1.420,-1.429,-1.441,-1.449);

#The following creates a vector of daily mean salinity values:

salinity = c(27.781,27.813,27.871,27.931,27.961,27.989,28.021,28.035,28.045,28.066, 28.093,28.116,28.138,28.155,28.174,28.184,28.194,28.212,28.217,28.227,28.242,28.250, 28.255,28.269,28.286,28.295,28.316,28.335,28.350,28.368,28.378);

#The following creates a vector of daily mean chlorophyll:

chlorophyll.concentration = c(0.0845, 0.0778, 0.0726, 0.0667, 0.0643, 0.0601, 0.0593, 0.0577, 0.0604, 0.0576, 0.0518, 0.0490, 0.0485, 0.0476, 0.0472, 0.0448, 0.0449, 0.0438, 0.0419, 0.0422, 0.0427, 0.0412, 0.0397, 0.0400, 0.0401, 0.0405, 0.0402, 0.0410, 0.0402, 0.0404, 0.0396);

#The following creates a vector of daily mean oxygen concentration values:

oxygen.concentration = c(7.776,7.745,7.737,7.724,7.732,7.715,7.631,7.664,7.819,7.926, 7.925,7.901,7.885,7.858,7.836,7.817,7.805,7.785,7.770,7.754,7.743,7.765,7.758,7.741, 7.721,7.706,7.694,7.684,7.673,7.655,7.665);

#The function "data.frame" creates a data frame out of the above defined vectors:

data = data.frame(year, month, day.of.month, day.of.year, ice.draft, water.temperature, salinity, chlorophyll.concentration, oxygen.concentration);

#################################################### #Section 3: Getting information about a data frame. ####################################################

class(data); #Tells us that "data" is a data frame. is.data.frame(data); #Ask if "data" is a data frame. names(data); #The column names.

head(data); #Prints the first few lines of "data", with column names. tail(data); #Prints the last few lines of "data", with column names. nrow(data); #Tell us the number of rows.

ncol(data); #Tell us the number of columns;

dim(data); #The number of rows and columns in data.

(3)

############################################## #Section 4: Plotting data.

##############################################

#This section provides three different ways for plotting the temperature data #versus the day of month; the difference lies in how we access the variable names #for the data frame.

with(data, plot(water.temperature ~ day.of.month, col="red", type="l") ); #The columns of a data frame can be referenced using the "$" symbol. Example: plot(data$water.temperature ~ data$day.of.month, type="l", col="red");

#The function “attach” makes the variable names in the data frame accessible.

#Using the function “attach” can be convenient; however, it can also cause confusion, #especially if other data frames are using the same variable names. It is a good idea #to use “detach” after you are done using the variable names for a data frame. Example: attach(data);

plot(water.temperature~day.of.month, type="l", col="red"); detach(data);

############################# #Section 5: Indexing columns. #############################

names(data)[8]; #The name of the 8th column. data[ ,8]; #The data values in the 8th column. data$salinity; #The salinity data values.

data$water.temperature; #The water temperature data values. head(data[ ,4:7]); #The top of columns 4 through 7. data[ ,c(2,5)]; #View columns 2 and 5.

############################ #Section 6: Indexing rows. ############################

data[3, ]; #View third first row.

data[5:10, ]; #View the 5th to the 10th row. data[seq(1,31,2), ]; #View every second row.

data[sample(31,31), ]; #View the rows in random order. data[c(2,5), ]; #View rows 2 and 5.

#The row with the highest temperature:

data[which(data$water.temperature == max(data$water.temperature)), ] #The rows ordered by temperature:

data[order(data$water.temperature),] #Highest to lowest. data[rev(order(data$water.temperature)),] #Lowest to highest.

(4)

###################################### #Section 7: Modifying the data frame. ######################################

data <- data[,-c(1,2)]; #Delete columns 1, 2 & 3. names(data); #Check the variable names. #Change the column names:

names(data)[1] = c("Column A"); names(data)[2] = c("Column B");

############################################################## #Section 8: Apply a function to the columns of the data frame. ##############################################################

# “MARGIN=2” indicates that the function is applied to the columns.

apply(data, MARGIN=2, FUN=mean); #The function "mean" is applied to the columns. apply(data, MARGIN=2, FUN=max); #The function "max" is applied to the columns. apply(data, MARGIN=2, FUN=min); #The function "min" is applied to the columns.

########################### #END of R Script.

###########################

Data Source and Documentation

Data Source: Ocean Networks Canada Arctic Observatory at Cambridge Bay, Nunavut.

Data accessed April 2013 at:

http://www.neptunecanada.com/data-collaboration/

Measurements (units): ice thickness (m); water temperature (C); salinity, (psu); chlorophyll

concentration, (

µ

g/l); oxygen concentration (ml/l).

Station: Cambridge Bay Dock, Latitude: 69.11386 N; Longitude 105.06058 W; Depth 6.0 m.

For this tutorial, measurements have each been rounded to a reduced number of decimal places.

For details about instrumentation and data collection, see the NEPTUNE Canada website:

http://www.neptunecanada.com/data-collaboration/

.

References

R Core Team, 2012. R: A language and environment for statistical computing. R Foundation for

Statistical Computing, Vienna, Austria. Available at:

http://www.R-project.org/

.

R Core Team. An Introduction to R. Available at:

http://cran.r-project.org/doc/manuals/R-intro.html

.

(5)

Arctic Observatory, Cambridge Bay, Nunavut

Daily Averages (UTC) for December 2012

Year Month Day of Month Day of Year Ice Thickness [m] Water Temperature [C] Salinity [psu] Chlorophyll Concentration [µg/l] Oxygen Concentration [ml/l] 2012 12 1 336 0.620 -0.975 27.781 0.0845 7.776 2012 12 2 337 0.626 -0.946 27.813 0.0778 7.745 2012 12 3 338 0.629 -0.925 27.871 0.0726 7.737 2012 12 4 339 0.634 -0.899 27.931 0.0667 7.724 2012 12 5 340 0.642 -0.902 27.961 0.0643 7.732 2012 12 6 341 0.652 -0.883 27.989 0.0601 7.715 2012 12 7 342 0.662 -0.857 28.021 0.0593 7.631 2012 12 8 343 0.670 -0.929 28.035 0.0577 7.664 2012 12 9 344 0.682 -1.230 28.045 0.0604 7.819 2012 12 10 345 0.696 -1.392 28.066 0.0576 7.926 2012 12 11 346 0.704 -1.360 28.093 0.0518 7.925 2012 12 12 347 0.709 -1.341 28.116 0.0490 7.901 2012 12 13 348 0.718 -1.313 28.138 0.0485 7.885 2012 12 14 349 0.723 -1.285 28.155 0.0476 7.858 2012 12 15 350 0.727 -1.289 28.174 0.0472 7.836 2012 12 16 351 0.735 -1.311 28.184 0.0448 7.817 2012 12 17 352 0.740 -1.302 28.194 0.0449 7.805 2012 12 18 353 0.747 -1.311 28.212 0.0438 7.785 2012 12 19 354 0.751 -1.327 28.217 0.0419 7.770 2012 12 20 355 0.758 -1.331 28.227 0.0422 7.754 2012 12 21 356 0.758 -1.345 28.242 0.0427 7.743 2012 12 22 357 0.764 -1.354 28.250 0.0412 7.765 2012 12 23 358 0.762 -1.354 28.255 0.0397 7.758 2012 12 24 359 0.765 -1.375 28.269 0.0400 7.741 2012 12 25 360 0.768 -1.386 28.286 0.0401 7.721 2012 12 26 361 0.773 -1.374 28.295 0.0405 7.706 2012 12 27 362 0.778 -1.406 28.316 0.0402 7.694 2012 12 28 363 0.784 -1.420 28.335 0.0410 7.684 2012 12 29 364 0.793 -1.429 28.350 0.0402 7.673 2012 12 30 365 0.801 -1.441 28.368 0.0404 7.655 2012 12 31 366 0.806 -1.449 28.378 0.0396 7.665

Table 1. Daily averages for December 2012, from the Arctic Observatory, Cambridge Bay, Nunavut. Data

Source: NEPTUNE Canada, http://www.neptunecanada.com/data-collaboration/. (Accessed January 2013.)

Daily averages are relative to time in UTC. For this tutorial, measurements have each been rounded to a

reduced number of decimal places. These data are intended for instructional purposes only; for research

purposes, raw data are available from the Ocean Networks Canada website. The daily averages in this table

can be loaded into the application R by cutting and pasting the instructions in Section 1 onto the R

Figure

Table 1. Daily averages for December 2012, from the Arctic Observatory, Cambridge Bay, Nunavut

References

Related documents