Confidence Intervals - OReilly Statistics in a Nutshell A Desktop Quick Reference Aug 2008 pdf

When we calculate a single statistic, such as the mean, to describe a sample, that is referred to as calculating apoint estimatebecause the number represents a single point on the number line.The sample mean is a point estimate, and is a useful statistic as the best estimate of the population mean.However, we know that the sample mean is only an estimate and that if we drew a different sample, the mean of the sample would probably be different.We don’t expect that every possible sample we could draw will have the same sample mean.It is reasonable to ask how much the point estimate is likely to vary by chance if we had chosen a different sample, and in many professional fields it has become common practice to report both point estimates andinterval estimates.A point estimate is a single number, while an interval estimate is a range or interval of numbers.

The most common interval estimate is the confidence interval, which is the interval between two values that represent the upper and lowerconfidence limits

orconfidence boundsfor a statistic.The formula used to calculate the confidence

interval depends on the statistic being used and will be included in the relevant chapters: this section is meant to convey the concept of the confidence interval. The confidence interval is calculated using a predetermined significance level, often called α (the Greek letter alpha), which is most often set at 0.05, as discussed above.The confidence coefficient is calculated as (1 – α) or, as a percentage, 100(1 –α)%.Thus ifα= 0.05, the confidence coefficient is 0.95 or 95%.The latter usage is more common; for instance, professional journals often require that you report the 95% confidence interval for your statistics.

Confidence intervals are based on the notion that if a study was repeated an infinite number of times, each time drawing a different sample from the same population and constructing a confidence interval based on that sample, x% of the time the confidence interval would contain the unknown parameter value that the study seeks to estimate.For instance, if our test statistic is the mean and we are using 95% confidence intervals, over an infinite number of repetitions of the study, 95% of the time the confidence interval constructed from the study would contain the mean of the population.For this reason, the confidence interval is sometimes described as presenting a plausible range of values for the mean. The confidence interval conveys important information about the precision of the point estimate.For instance, suppose we have two samples of students and in both cases the mean IQ score is 100.In one case, however, the 95% confidence interval is (95,105), while in the other case the 95% confidence interval is (80,120).Because the former confidence interval is much narrower than the latter, the estimate of the mean is more precise for the first sample.

p-values | 145

Inferential

Statistics

p-values

It is a fact of life when working with inferential statistics that we are always trying to estimate something that we can’t measure directly.For instance, we don’t have the ability to collect data from every hypertensive adult in the world, but we can select a sample of hypertensive adults, design an experiment involving them, and analyze the data we thus collect.Because we understand that sampling error is always a possibility in studies based on samples, we want to know the probability that the results obtained from our sample were not due to chance.If we had the means to draw repeated samples from the population and repeat the experiment, how likely is it that we would obtain similar results most of the time?

Ap-valueusually expresses the probability that results at least as extreme as those

obtained in a sample were due to chance.The phrase “at least as extreme” is necessary because most statistical tests involve comparing the test statistic to some hypothetical distribution (such as the normal distribution, as illustrated below) where scores closer to the center of the distribution are most common and scores become less likely as they are further from the center of the distribution. This may be clearer by considering a simple illustration.Suppose we are engaged in an experiment flipping a coin that we believe to be fair, i.e., a coin for which heads or tails are equally likely outcomes for any single flip.We can express this formally asP(H) =P(T) = 0.5. We will call each flip a trial. Our expectation is that we will get 5 heads on 10 trials, although we know that on any particular set of 10 trials we may get a different number of heads.So we flip the coin 10 times and 8 times it comes up heads.We want to know the p-value of this result, i.e., how likely is it that a coin with a probability of 0.5 for heads on any single trial would produce 8 heads in 10 trials?

Using a binomial table, computer software, or the binomial formula, we find that the probability of this exact result (8 heads in 10 trials) is 0.0439, meaning that less than 5% of the time would we expect to get exactly 8 heads in 10 flips with a fair coin.The probability for 9 heads in 10 trials is 0.0098, and for 10 heads in 10 trials is 0.0010. This demonstrates that as results move further away from the expected result of 5 heads in 10 trials, they become less likely.

If we are evaluating the probability that the coin truly is fair, results that are far from our expectation give us strong evidence that it in fact is not fair.For this reason, we usually calculate the probability not just of the result we obtained in our experiment, but of results at least as extreme as those we obtained.In this case, the probability of getting 8, 9 or 10 heads in 10 flips of a fair coin is 0.0439 + 0.0098 + 0.0010, or 0.0547. This is thep-value for the result of at least 8 heads in 10 trials using a coin whereP(heads) = 0.5.

p-values are commonly reported for most research results involving statistical calculations, in part because intuition is a poor guide to how unusual a particular result is.For instance, many people might think it is unusual to get 8 or more heads on 10 trials using a fair coin.In this case, the binomial probability of such a result has a p-value of 0.0547. This result does not allow us to reject the null hypothesis that the coin is fair, i.e., P(heads) = 0.5, using the standard rule of thumb that a p-value must be less than 0.05 for results to be considered significant.

In document OReilly Statistics in a Nutshell A Desktop Quick Reference Aug 2008 pdf (Page 168-170)