6. Discrete Random Variables and Probability Models 55
6.11 Appendix: R Software
The R software system is a powerful tool for handling probability distributions and data concerning random variables. The following short notes describe basic features of R; further information and links to other resources are available on the course web page. Unix and Windows versions of R are available on Math Faculty undergraduate servers, and free copies can be downloaded from the web.
You should make yourself familiar with R, since some problems (and most applications of probability) require computations or graphics which are not feasible by hand.
Some R Basics
R is a statistical software system that has excellent numerical, graphical and statistical capabilities. There are Unix and Windows versions. These notes are a very brief introduction to a few of the features of R. Web resources have much more information. Links can be found on the Stat 230 web page. You can also download a Unix or Windows version of R for free.
1.PRELIMINARIES
R is invoked on Math Unix machines by typing R. The R prompt
is >. R objects include variables, functions, vectors, arrays, lists and other items. To see online documentation about something, we use the help function. For example, to see documentation on the function mean(), type help(mean). In some cases help.search() is helpful.
The assignment symbol is <- : for example,
x<- 15 assigns the value 15 to variable x.
To quit an R session, type q() 2.VECTORS
Vectors can consist of numbers or other symbols; we will consider only numbers here. Vectors are defined using c(): for example,
x<- c(1,3,5,7,9)
defines a vector of length 5 with the elements given. Vectors and other classes of objects possess certain attributes. For example, typing
length(x) will give the length of the vector x. Vectors are a convenient way to store values of a function (e.g. a probability function or a c.d.f) or values of a random variable that have been recorded in some experiment or process.
3.ARITHMETIC
The following R commands and responses should explain arithmetic operations.
> 7+3 [1] 10
> 7*3 [1] 21
> 7/3
[1] 2.333333
> 2^3 [1] 8
4.SOME FUNCTIONS
Functions of many types exist in R. Many operate on vectors in a
transparent way, as do arithmetic operations. (For example, if x and y are vectors then x+y adds the vectors element-wise; thus x and y must be the same length.) Some examples, with comments, follow.
> x<- c(1,3,5,7,9) # Define a vector x
> x # Display x
[1] 1 3 5 7 9
> y<- seq(1,2,.25) #A useful function for defining a vector whose elements are an arithmetic progression
> y
[1] 1.00 1.25 1.50 1.75 2.00
> y[2] # Display the second element of vector y [1] 1.25
> y[c(2,3)] # Display the vector consisting of the second and third elements of vector y.
[1] 1.25 1.50
> mean(x) #Computes the mean of the elements of vector x [1] 5
> summary(x) # A useful function which summarizes features of a vector x
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 3 5 5 7 9
> var(x) # Computes the (sample) variance of the elements of x [1] 10
> exp(1) # The exponential function [1] 2.718282
> exp(y)
[1] 2.718282 3.490343 4.481689 5.754603 7.389056
> round(exp(y),2) # round(y,n) rounds the elements of vector y to n decimals
[1] 2.72 3.49 4.48 5.75 7.39
> x+2*y
[1] 3.0 5.5 8.0 10.5 13.0
5. GRAPHS
To open a graphics window in Unix, type x11(). Note that in R, a graphics window opens automatically when a graphical function is used.
There are various plotting and graphical functions. Two useful ones are
plot(x,y) # Gives a scatterplot of x versus y; thus x and y must be vectors of the same length.
hist(x) # Creates a frequency histogram based on the values in the vector x. To get a relative frequency histogram (areas of rectangles sum to one) use hist(x,prob=T).
Graphs can be tailored with respect to axis labels, titles, numbers of plots to a page etc. Type help(plot), help(hist) or help(par) for some information.
To save/print a graph in R using UNIX, you generate the graph you would
like to save/print in R using a graphing function like plot() and type:
dev.print(device,file="filename")
where device is the device you would like to save the graph to (i.e. x11) and filename is the name of the file that you would like the graph saved to. To look at a list of the different graphics devices you can save to, type help(Devices).
To save/print a graph in R using Windows, you can do one of two things.
a) You can go to the File menu and save the graph using one of several formats (i.e. postscript, jpeg, etc.). It can then be printed. You may also copy the graph to the clipboard using one of the formats and then paste to an editor, such as MS Word. Note that the graph can be printed directly to a printer using this option as well.
b) You can right click on the graph. This gives you a choice of copying the graph and then pasting to an editor, such as MS Word, or saving the graph as a metafile or bitmap. You may also print directly to a printer using this option as well.
6.DISTRIBUTIONS
There are functions which compute values of probability or probability density functions, cumulative distribution functions, and quantiles for various distributions. It is also possible to generate (pseudo) random samples from these distributions. Some examples follow for Binomial and Poisson distributions. For other distribution information, type
help(rhyper), help(rnbinom) and so on. Note that R does not have any function specifically designed to generate random samples from a discrete uniform distribution (although there is one for a continous uniform
distribution). To generate n random samples from a discrete UNIF(a,b), use sample(a:b,n,replace=T).
> y<- rbinom(10,100,0.25) # Generate 10 random values from the Binomial distribution Bi(100,0.25). The values are
stored in the vector y.
> y # Display the values
[1] 24 24 26 18 29 29 33 28 28 28
> pbinom(3,10,0.5) # Compute P(Y<=3) for a Bi(10,0.5) random variable.
[1] 0.171875
> qbinom(.95,10,0.5) # Find the .95 quantile (95th percentile) for
[1] 8 Bi(10,0.5).
> z<- rpois(10,10) # Generate 10 random values from the Poisson
distribution Poisson(10). The values are stored in the vector z.
> z # Display the values
[1] 6 5 12 10 9 7 9 12 5 9
> ppois(3,10) # Compute P(Y<=3) for a Poisson(10) random variable.
[1] 0.01033605
> qpois(.95,10) # Find the .95 quantile (95th percentile) for
[1] 15 Poisson(10).
To illustrate how to plot the probability function for a random variable, a Bi(10,0.5) random variable is used.
# Assign all possible values of the random variable, X ~ Bi(10,0.5) x <- seq(0,10,by=1)
# Determine the value of the probability function for possible values of X x.pf <- dbinom(x,10,0.5)
# Plot the probability function
barplot(x.pf,xlab="X",ylab="Probability Function",
names.arg=c("0","1","2","3","4","5","6","7","8","9","10"))