• No results found

Introduction and usefull hints for the R software

N/A
N/A
Protected

Academic year: 2021

Share "Introduction and usefull hints for the R software"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction and usefull hints for the

R software

What is R

• Statistical software and programming language • Freely available (inluding source code)

• Started as a free re-implementation of the S-plus programming language

• Websites

R-CRAN = Comprehensive R Archive Network http://cran.r-project.org/

The R homepage

http://www.r-project.org/

Belgian mirror for installation(KU Leuven) http://www.freestatistics.org/cran/ – R-wiki

http://rwiki.sciviews.org/

c

Maarten Jansen STAT-F-413 — R p.1

GUI and Rcmdr

• InWindows, R opens by default as aGraphical User Interface (the basic R-Gui)

• Further refined interaction with the user is possible through the

R-commander, which is an R-package that needs to be installed

• Other GUI’s for R include

– RStudio – Rattle – Red-R – JGR – ...

• We first discuss the installation of R-packages

R-packages

• R-packages add advanced routines/software to basic R • See

http://cran.r-project.org/web/packages/

for a list of packages

• Some of these packages come with documentation (i.e., an online book), which is listed on this page

http://cran.r-project.org/other-docs.html

• Packages have to bedownloaded and installed(once)

– Using theR-GUI: Click the button packages, this starts a

dialogue and lists all packages availaible

– Alternatively, type

install.packages("mypackage",dependencies = TRUE)

(2)

– Whatever method, you choose, while installing a package, R

needs access to the internet to download it, so you must work on-line

• Packages have to beloadedin every R session again, using the command

library(mypackage)

No quotes this time

• Example: download, install and load the R-commander

– Download and install (once):

install.packages("Rcmdr",dependencies = TRUE) – Load (every session again):

library(Rcmdr)

c

Maarten Jansen STAT-F-413 — R p.4

Browsing through folders and loading data (I)

• Loading statistical routines: packages • Loading data:

datamatrix = read.table(file.choose())

opens a window to browse for a file; then creates a matrix from that file

• Alternatively:

getwd()

tells you the working directory.

setwd("e:/Rfiles")

sets the working directory toe/Rfiles(mind the quotes) Then load the file

datamatrix = read.table("mesquite.txt")

c

Maarten Jansen STAT-F-413 — R p.5

Browsing through folders and loading data (II)

• Changing the working directory is also possible through the menu of the basic RGui:

File > Change dir... >

• All at once:

datamatrix = read.table("e:/Rfiles/mesquite.txt")

• In the Rcmdr:

(1) Click Data, Import data (not: Load data set), from text file...; (2) Enter name for data set, tick off “Variable names in file” (3) Browse through your folders.

A text file as input data file

• An input file typically contains a matrix of values (see example ofmesquite.txton slide 8). Once successfully loaded, this matrix is accessible throughout the session.

• The columns of the matrix are given default names"V1","V2", etc., unless the data file provides a header with the names as in the example ofcatdataexemple.txt(see slide 9) The names in the header are stored into the current session if the option header is activated:

datamatrix = read.table("catdataexemple.txt",header=TRUE)

• In Rcmdr: proceed as on slide 6, but tick the “Variable names in file”

(3)

The filemesquite.txtlooks like this:

2.50 2.3 1.70 1.40 723.0 5.20 4.0 3.00 2.50 4052.0 2.00 1.6 1.70 1.40 345.0 1.60 1.6 1.60 1.30 330.9 1.40 1.0 1.40 1.10 163.5 3.20 1.9 1.90 1.50 1160.0 1.90 1.8 1.10 .80 386.6 2.40 2.4 1.60 1.10 693.5 2.50 1.8 2.00 1.30 674.4 2.10 1.5 1.25 .85 217.5 2.40 2.2 2.00 1.50 771.3 2.40 1.7 1.30 1.20 341.7 1.90 1.2 1.45 1.15 125.7 2.70 2.5 2.20 1.50 462.5 1.30 1.1 .70 .70 64.5 2.90 2.7 1.90 1.90 850.6 2.10 1.0 1.80 1.50 226.0 4.10 3.8 2.00 1.50 1745.1 2.80 2.5 2.20 1.50 908.0 1.27 1.0 .92 .62 213.5

c

Maarten Jansen STAT-F-413 — R p.8

The filecatdataexemple.txtlooks like this:

category X 1 23 1 24 1 32 1 25 1 21 1 19 2 32 2 31 2 35 2 28 2 31 3 31 3 24 3 42 c

Maarten Jansen STAT-F-413 — R p.9

Creating data

make a vector by

x = c(2,3,4)

make the matrixA= 

  

1 2 3 4 5 6 7 8 9

  

by

A = matrix(c(1,2,3,4,5,6,7,8,9),ncol=3,byrow=TRUE)

or

A = matrix(c(1,2,3,4,5,6,7,8,9),ncol=3,byrow=1)

Matrix-vector product:

A%*%x

Displaying data

Letxbe a matrix of data in the work space, then

x

displays the whole matrix

x[1,]

displays the first row

x[2,]

displays the second row

x[,3]

(4)

Displaying large vectors

The display could look like

[1] 72 66 64 66 40 74 50 0 70 96 92 74 80 60 [19] 30 70 88 84 90 80 94 70 76 66 82 92 75 76 [37] 76 76 68 72 64 84 92 110 64 66 56 70 66

meaning: the first element is72, then the second is66, etc. The 19th element is30, then the 20th is70, etc.

The 37th element is76, then the 38th is again76, etc.

c

Maarten Jansen STAT-F-413 — R p.12

Naming and renaming columns

Suppose we have uploaded the data matrix into

mesquite = read.table("e:/Rfiles/mesquite.txt")

then

names(mesquite)

displays the names of the columns of matrixmesquite names(mesquite)=c("x1","x2","x3","x4","y")

renames the columns

to give direct access to these variablesx1,x2,x3,x4,yuse

attach(mesquite)

allows to compute, for instance

mean(x1)

c

Maarten Jansen STAT-F-413 — R p.13

Regression and ANOVA

Perform linear fitlm:

mesquite = read.table("mesquite.txt") names(mesquite)=c("x1","x2","x3","x4","y") attach(mesquite)

fit1=lm(log(y)˜log(x1)+log(x2)+log(x3)+log(x4)) fit1

anova(fit1)

Anova must be performed on a fit, it cannot be performed on raw data.

In case there are several active data matrices in the session, with possible shared names for the variables, one might prefer not to use

attach. In that case, we can do the analysis with

fit1 = lm(log(y)˜log(x1)+log(x2)+log(x3)+log(x4),data=mesqui

Categorical data analysis

datamatrix = read.table("catdataexemple.txt",header=TRUE)

the optionheader=TRUEpreserves the names in the txt-file (see before)

fitC = lm(X˜factor(category),data=datamatrix) anova(fitC)

(5)

Setting up simulations

Generating random values...

mu = 10 stdev = 3 n_simul = 100 samplesize = 40 set.seed(0)

X = matrix(rnorm(n_simul*samplesize,mean=mu,sd=stdev),nrow = Xbar = rep(NA,n_simul)

The variableXbaris initialised as a vector of valuesNA

(Not-a-number). If further calculations fail in replacing this values, this is easily detected (taking zeros as initial value may hide failed calculations if the user thinks 0 is the outcome)

Next, we compute sample means for all rows ofX, using a user-defined function

c

Maarten Jansen STAT-F-413 — R p.16

User defined functions

Now define a function for confidence intervals

confint = function(X,alfa=0.05,side="2"){ Xbar = mean(X)

S2 = mean(Xˆ2)-Xbarˆ2 n = length(X)

CIwidth = ifelse(side=="L"|side=="R",

sqrt(S2)*qt(1-alfa,n-1)/sqrt(n),sqrt(S2)*qt(1-alfa/2,n-1 lowerbound = ifelse(side=="L"|side=="2",Xbar-CIwidth,-Inf) upperbound = ifelse(side=="R"|side=="2",Xbar+CIwidth,Inf) return(list(lowerbound,upperbound,Xbar,S2))

}

Note the use of default values (alfa=0.05,side="2") in the definition of the function

c

Maarten Jansen STAT-F-413 — R p.17

Using user defined functions

A user defined function can be use througout the current session, for instance in

X = rnorm(100,mean=mu,sd=stdev) ci = confint(X)

ci = confint(X,alfa=0.01)

Non-default values can be passed in the right order or explicitly by using the variable name followed by an equal sign and a value. For use on all rows of a matrix, useapply:

X = matrix(rnorm(200,mean=mu,sd=stdev),nrow=2) ci = apply(X,1,confint)

Some graphical commands

• plot(x1)

plot(x1,type="l")(for plots with polylines)

• Two plots on the same window: suppose we want to plot two rows of the random matrix on page 16.

plot(X[,1],type="l",col="red") lines(X[,2],col="blue")

Define your own colors:

lines(X[,3],col=rgb(0.23,0.156,0.28))

• Export your graph to encapsulated postscript (eps). In R, you have to redo all the plot commands after initializing

postscript("example1.eps",horizontal = FALSE, onefile =

and conclude by

(6)

The commandpostscriptmakes the current plot on screen

inactive.

The commanddev.off()closes the current active device (postscript, pdf or window on screen)

c

Maarten Jansen STAT-F-413 — R p.20

Exporting to eps or pdf

So, in order to make an eps of the plot on slide 19:

postscript("example1.eps",horizontal = FALSE, onefile = FALSE) plot(X[,1],type="l",col="red")

lines(X[,2],col="blue")

lines(X[,3],col=rgb(0.23,0.156,0.28)) dev.off()

Export to pdf:

pdf("example1.pdf",horizontal = FALSE, onefile = FALSE) plot(X[,1],type="l",col="red")

lines(X[,2],col="blue")

lines(X[,3],col=rgb(0.23,0.156,0.28)) dev.off()

c

Maarten Jansen STAT-F-413 — R p.21

Graphics for exploratory data analysis

• boxplot(x1)

for a single boxplot; recall that single boxplots are of little practical use; try to compare several variables:

boxplot(x1,x2)

On a matrix: boxplot(mesquite)creates boxplots for all columns, and it labels each plot with the columns’ names It is possible to apply a transformation to all elements of a matrix:

boxplot(log(mesquite))

• Normal probability plot (= qq plot against theoretical normal distribution):

qqnorm(x1)

• QQ-plot (two empirical distributions): qqplot(x1,x2) • Histograms:hist(x1)

(use?histfor info on parameters, such as number of bins etc.)

Getting help

Besides the numerous documentation on the internet, getting help in R proceeds by

help(<your command>)

or, equivalently,

?<your command>

Tab completion helps in finding options: type part of a command + Tab

(7)

Quitting R quit()

or, equivalently,

q()

(Don’t forget the brackets) When using R, think about

• (brackets) • ”quotes”

• Question marks ? for help • Tab completion

c

References

Related documents

[r]

[r]

university reform claims that strategic manage- ment has been strengthened in the universities, while the role of university per- sonnel has remained weak. Two major strategy

Retirement plan participants can get guidance and signals about strategies for using their funds in retirement from their benefit plan architecture and communications, from general

In this review, the research carried out using various ion-exchange resin-like adsorbents including modified clays, lignocellulosic biomasses, chitosan and its derivatives, microbial

While in Table 3 we present a pooled specification, to increase the chances for the added variables to exert a significant impact, in unreported regressions we repeat the

The output characteristic (Fig. 6-1) for a water-gated PBTTT film is close to ideal, with very little hysteresis and a low threshold between 0V and 0.1V. The responses of PBTTT

In undertaking research for this Report, the authors developed two surveys, a General Prison Population Survey that was administered to nearly 30% of all women prisoners (246