• No results found

sdlog=1

The following three distributions are used to describe sampling distributions. These are the t-distribution, the F-distribution, and the chi-squared distribution (sometimes written using the Greek symbol χ).

The family names in R are t, f, and chisq. Their parameters are termed “degrees of freedom” and are related to the sample size when used as sampling distributions. For the t and chi-squared distributions, the degrees-of-freedom argument is df=. For the F- distribution, as two degrees of freedom are specified, the arguments are df1= and df2=.

For example, values l and r for each distribution containing 95% of the area can be found as follows: > qt(c(.025,.975), df=10) # 10 degrees of freedom [1] −2.228 2.228 > qf(c(.025,.975), dfl=10, df2=5) # 10 and 5 degrees of freedom [1] 0.2361 6.6192 > qchisq(c(.025,.975), df=10) # 10 degrees of freedom [1] 3.247 20.483 5.2.5Problems

5.8 A die is rolled five times. What is the probability of three or more rolls of four?

5.9 Suppose a decent bowler can get a strike with probability p=.3. What is the chance he gets 12 strikes in a row?

5.10 A fair coin is tossed 100,000 times. The number of heads is recorded. What is the probability that there are between 49,800 and 50,200 heads?

5.11 Suppose that, on average, a baseball player gets a hit once every three times she bats. What is the probability that she gets four hits in four at bats?

5.12 Use the binomial distribution to decide which is more likely: rolling two dice twenty-four times and getting at least one double sixes, or rolling one die four times and getting at least one six?

5.13 A sample of 100 people is drawn from a population of 600,000. If it is known that 40% of the population has a specific attribute, what is the probability that 35 or fewer in the sample have that attribute?

5.14 If Z is Normal(0, 1), find the following: 1. P(Z≤2.2)

2. P(−1<Z≤2) 3. P(Z>2.5)

4. b such that P(−b<Zb)=0.90.

5.15 Suppose that the population of adult, male black bears has weights that are approximately distributed as Normal(350,75). What is the probability that a randomly observed male bear weighs more than 450 pounds?

5.16 The maximum score on the math ACT test is 36. If the average score for all high school seniors who took the exam was 20.6 with a standard deviation of 5.5, what percent received the passing mark of 22 or better? If 1,000,000 students took the test, how many

more would be expected to fail if the passing mark were moved to 23 or better? Assume a normal distribution of scores.

5.17 A study found that foot lengths for Japanese women are normally distributed with mean 24.9 centimeters and standard deviation 1.05 centimeters. For this population, find the probability that a randomly chosen foot is less than 26 centimeters long. What is the 95th percentile?

5.18 Assume that the average finger length for females is 3.20 inches, with a standard deviation of 0.35 inches, and that the distribution of lengths is normal. If a glove manufacturer makes a glove that fits fingers with lengths between 3.5 and 4 inches, what percent of the population will the glove fit?

5.19 The term “six sigma” refers to an attempt to reduce errors to the point that the chance of their happening is less than the area more than six standard deviations from the mean. What is this area if the distribution is normal?

5.20 Cereal is sold by weight not volume. This introduces variability in the volume due to settling. As such, the height to which a cereal box is filled is random. If the heights for a certain type of cereal and box have a Normal(12, 0.5) distribution in units of inches, what is the chance that a randomly chosen cereal box has cereal height of 10.7 inches or less?

5.21 For the f height variable in the father. son (UsingR) data set, compute what percent of the data is within 1, 2, and 3 standard deviations from the mean. Compare to the percentages 68%, 95%, and 99.7%.

5.22 Find the quintiles of the standard normal distribution.

5.23 For a Uniform(0, 1) random variable, the mean and variance are 1/2 and 1/12. Find the area within 1, 2, and 3 standard deviations from the mean and compare to 68%, 95%, and 99.7%. Do the same for the Exponential(l/5) distribution with mean and standard deviation of 5.

5.24 A q-q plot is an excellent way to investigate whether a distribution is approximately normal. For the symmetric distributions Uniform(0, 1), Normal(0, 1) and t with 3 degrees of freedom, take a random sample of size 100 and plot a quantile-normal plot using qqnorm(). Compare the three and comment on the curve of the plot as it relates to the tail length. (The uniform is short-tailed; the t-distribution with 3 degrees of freedom is long-tailed.)

5.25 For the t-distribution, we can see that as the degrees of freedom get large the density approaches the normal. To investigate, plot the standard normal density with the command

> curve(dnorm(x),−4,4)

and add densities for the t-distribution with k=5,10,25,50, and 100 degrees of freedom. These can be added as follows:

> k=5; curve(dt(x,df=k), lty=k, add=TRUE)

5.26 The mean of a chi-squared random variable with k degrees of freedom is k. Can you guess the variance? Plot the density of the chi-squared distribution for k=2, 8, 18, 32, 50, and 72, and then try to guess. The first plot can be done with curve (), as in

> curve(dchisq(x,df=2), 0, 100)

Subsequent ones can be added with

> k=8; curve(dchisq(x,df=k), add=TRUE)

5.3The central limit theorem

It was remarked that for an i.i.d. sample from a population the distribution of the sample mean had expected value µ and standard deviation where µ and σ are the population parameters. For large enough n, we see in this section that the sampling distribution of is normal or approximately normal.

5.3.1Normal parent population

When the sample X1, X2, …, Xn is drawn from a Normal(µ, σ) population, the distribution

of is precisely the normal distribution. Figure 5.8 draws densities for the population, and the sampling distribution of for n=5 and 25 when µ=0 and σ=1.

> n=25; curve(dnorm(x,mean=0,sd=l/sqrt(n)), −3,3, + xlab="x",ylab="Densities of sample mean",bty="1") > n=5; curve(dnorm(x,mean=0,sd=l/sqrt(n)), add=TRUE) > n=1; curve(dnorm(x,mean=0,sd=l/sqrt(n)), add=TRUE)

The center stays the same, but as n gets bigger, the spread of gets smaller and smaller. If the sample size goes up by a factor of 4, the standard deviation goes down by 1/2 and the density concentrates on the mean. That is, with greater and greater probability, the random value of is close to the mean, µ, of the parent population. This phenomenon of the sample average concentrating on the mean is known as the law of large numbers.

For example, if adult male heights are normally distributed with mean 70.2 inches and standard deviation 2.89 inches, the average height of 25 randomly