• No results found

1) data

In Figure 6.3 a density estimate is plotted for each simulation. Observe how the densities squeeze in and become approximately bell shaped, as expected, even for n=10. As the standard deviation of is if n goes up four times (from 25 to 100, for example), the standard deviation gets cut in half. Comparing the density estimate for n=25 and n=100, we can see that the n=100 graph has about half the spread.

In this example the for loop takes the shortened form

for(i in values) a_single_command

If there is just a single command, then no braces are necessary. This is convenient when we use the up arrow to edit previous command lines.

In the problems, you are asked to simulate for a variety of parent populations to verify that the more skewed the data is, the larger n must be for the normal distribution to approximate the sampling distribution of

6.4Defining a function

In the previous examples, we have found a single sample of using a command like

> mean(runif(n))

This generates n i.i.d. samples from the uniform distribution and computes their sample mean. It is often convenient to define functions to perform tasks that require more than one step. Functions can simplify our typing, organize our thoughts, and save our work for reuse. This section covers some of the basics of functions—their basic structure and the passing of arguments. More details are available in Appendix E.

A basic function in R is conceptually similar to a mathematical function. In R, a function has a name (usually), a rule (the body of the function), a way of defining the inputs (the arguments to a function), and an output (the last command evaluated).

Functions in R are created with the f unction() keyword. For example, we define a function to find the mean of a sample of size 10 from the Exponential (1) distribution as follows:

> f = function() { + mean(rexp(10)) +}

To use this function, we type the name and parentheses

> f() [1] 0.7301

This function is named f. The keyword function() creates a function and assigns it to f. The body of the function is enclosed in braces: {}. The return value is the last line evaluated. In this case, only one line is evaluated—the one finding the mean(). (As with for loops, in this case the braces are optional.) In the next example we will discuss how to input arguments into a function.

If we define a function to find a single observation from the sampling distribution, then our simulation can be done with generic commands such as these:

> res = c()

> ford in 1:500) res[i] = f()

6.4.1Editing a function

An advantage of using functions to do our work is that they can be edited. The entire function needn’t be retyped when changes are desired. Basic editing can be done with either the fix() function or the edit() function. For example, the command fix(f) will open an editor (in Windows this defaults to notepad) to the definition of your function f. You make the desired changes to your function then exit the editor. The changes are assigned to f which can be used as desired.

The edit() function works similarly, but you must assign its return value, as in

> f = edit(f)

6.4.2Function arguments

A function usually has a different answer depending on the value of its arguments. Passing arguments to R functions is quite flexible. We can do this by name or position. As well, as writers of functions, we can create reasonable defaults.

Let’s look at our function f, which finds the mean of ten exponentials. If we edit its definition to be

f=function(n=10){ mean(rexp(n)) }

then we can pass in the size of the sample, n, as a parameter. We can call this function in several ways: f(), f(10), and f(n=10) are all the same and use n=10. This command uses n=100: f(100). The first argument to f is named n and is given a default value of 10 by the n=10 specification in the definition. Calling f by f() uses the defaults values. Calling f by f(100) uses the position of the argument to assign the values inside the function. In this case, the 100 is assigned to the only argument, n=. When we call f with f(n=100) we use a named argument. With this style there is no doubt what value n is being set to.

With fdefined, simulating 200 samples of for n=50 can be done as follows:

> res = c()

> ford in 1:200) res[i]=f(n = 50)

Better still, we might want to pass in a parameter to the exponential. The rate of the exponential is 1 over its mean. So changing f to

f = function(n = 10, rate = 1) { mean(rexp(n, rate = rate)) }

sets the first argument of f to n with a default of 10 and the second to rate with a default of 1. This allows us to change the size and rate as in f(50,2), which would take 50 Xi’s

each with rate 2 or mean 1/2. Alternately, we could do f(rate=1/2), which would use the default of 10 for n and use the value of 1/2 for rate. (Note that f (1/2) will not do this, as the 1/2 would match the position for n and not that of rate.)

The arguments of a function are returned by the args() command. This can help you sort out the available arguments and their names as a quick alternative to the more informative help page. When consulting the help pages of R’s builtin functions, the…argument appears frequently. This argument allows the function writer to pass along arbitrarily named arguments to function calls inside the body of a function.

6.4.3The function body

The function body is a block of commands enclosed in braces. As mentioned, the braces are optional if there is a single command. The return value for a function is the last command executed. The function return() will force the return of a function, with its argument becoming the return value.

Some commands executed during a function behave differently from when they are executed at the command line—in particular, printing and assignment.

During interactive usage, typing the name of an R object causes it to be “printed.” This shows the contents in a nice way, and varies by the type of object. For example, factors and data vectors print differently. Inside a function, nothing is printed unless you ask it to be.* The function print() will display an object as though it were typed on the command

line. The function cat() can be used to concatenate values together. Unlike print(), the cat() function will not print a new line character, nor the element numbers, such as [1]. A new line can be printed by including "\n" in the cat() command. When a function is called, the return value will print unless it is assigned to some object. If you don’t want this, such as when producing a graphic, the function invisible() will suppress the printing. Assignment inside a function block is a little different. Within a block, assignment to a variable masks any variable outside the block. This example defines x to be 5 outside the block, but then assigns x to be 6 inside the block. When x is printed inside the block the value of 6 shows; however, x has not changed once outside the block.

> x = 5 > f = function() { + x = 6 + x +} > f() [1] 6 > x [1] 5

If you really want to force x to change inside the block, the global assignment operator <<− can be used, as can the function assign(). Consult the help pages ?"<<−" and ?assign for more detail.

In the example above, the value of x used inside the block is the one assigned inside the block. If none had been assigned, R would have looked for a definition outside the block. For example:

> x = 5 > f = function() print (x) > f() [1] 5 > rm(x) > f ()

Error: Object “x” not found

* In Windows you may need to call flush, console() () to get the output. See the FAQ for details.

6.5Investigating distributions

■ Example 6.1: The sample median

The sample median, M, is a measurement of central tendency like the sample mean. Does it, too, have an approximately normal distribution? How does the sampling distribution of M reflect the parent distribution of the sample? Will M converge to some parameter of the parent distribution as converges to µ?

Figure 6.4Density estimates for