Statistical inference is the process of forming judgments about a population based on a sample from the population. In this chapter we describe populations and samples using the language of probability.
5.1Populations
In order to make statistical inferences based on data we need a probability model for the data. Consider a univariate data set. A single data point is just one of a possible range of values. This range of values will be called the population. We use the term random variable to be a random number drawn from a population. A data point will be a realization of some random variable. We make a distinction between whether or not we have observed or realized a random variable. Once observed, the value of the random variable is known. Prior to being observed, it is full of potential—it can be any value in the population it comes from. For most cases, not all values or ranges of values of a population are equally likely, so to fully describe a random variable prior to observing it, we need to indicate the probability that the random variable is some value or in a range of values. We refer to a description of the range and the probabilities as the distribution of a random variable.
By probability we mean some number between a and 1 that describes the likelihood of our random variable having some value. Our intuition for probabilities may come from a physical understanding of how the numbers are generated. For example, when tossing a fair coin we would think that the probability of heads would be one-half. Similarly, when a die is rolled the probability of rolling a would be one-sixth. These are both examples in which all outcomes are equally likely and finite in number. In this case, the probability of some event, a collection of outcomes, is the number of outcomes in the event divided by the total number of outcomes. In particular, this says the probability of any event is between a and 1.
For situations where our intuition comes about by performing the same action over and over again, our idea of the probability of some event comes from a proportion of times that event occurs. For example, the batting average of a baseball player is a running proportion of a player’s success at bat. Over the course of a season, we expect this number to get closer to the probability that an official at bat will be a success. This is an example in which a long-term frequency is used to give a probability.
For other populations, the probabilities are simply assigned or postulated, and our model is accurate as far as it matches the reality of the data collected. We indicate probabilities using a P() and random variables with letters such as X. For example, P(X≤5) would mean the probability the random variable X is less than or equal to 5.
5.1.1Discrete random variables
Numeric data can be discrete or continuous. As such, our model for data comes in the same two flavors.
Let X be a discrete random variable. The range of X is the set of all k where P(X=k]>0. The distribution of X is a specification of these probabilities. Distributions are not arbitrary, as for each k in the range, P(X=k)>0 and P(X=k)≤1. Furthermore, as X has some value, we have ∑k P(X=k)=1.
Here are a few examples for which the distribution can be calculated.
■ Example 5.1: Number of heads in two coin tosses If a coin is tossed two times we can keep track of the outcome as a pair. (H, T), for example, denotes “heads” then “tails.” The set {(H,H), (H, T), (T,H),(T, T}} contains all possible outcomes. If X is the number of heads, then X is either 0, 1, or 2. Intuitively, we know that for a fair coin all the outcomes have the same probability, so P(X= 0)=1/4, P(X=1)=1/2, and P(X=2)=1/4.
■ Example 5.2: Picking balls from a bag Imagine a bag with N balls, of which R are red and N—R are green. We pick a ball, note its color, replace the ball, and repeat. Let X be the number of red balls. As in the previous example, X is 0, 1, or 2. The probability that X=2 is intuitively (R/N)·(R/N) as R/N is the probability of picking a red ball on any one pick. The probability that X=0 is ((N−R)/N)2 by the same reasoning, and as all
probabilities add to 1, P(X=1)=2(R/N)((N−R)/N). This specifies the distribution of X. The binomial distribution describes the result of selecting n balls, not two.
The intuition that leads us to multiply two probabilities together is due to the two events being independent. Two events are independent if knowledge that one occurs doesn’t change the probability of the other occurring. Two events are disjoint if they can’t both occur for a given outcome. Probabilities add with disjoint events.
■ Example 5.3: Specifying a distribution We can specify the distribution of a discrete random variable by first specifying the range of values and then assigning to each k a number pk=P(X=k) such that ∑pk=1 and pk≥0. To visualize a physical model
where this can be realized, imagine making a pie chart with areas proportional to pk, placing a spinner in the middle, and spinning. The ending position determines the value of the random variable.
Figure 5.1 shows a spike plot of a distribution and a spinner model to realize values of X. A spike plot shows the probabilities for each value in the range of X as spikes, emphasizing the discreteness of the distribution. The spike plot is made with the following commands:
> k = 0:4
> p=c(1,2,3,2,1)/9
> plot(k,p,type="h",xlab="k",
ylab="probability",ylim=c(0,max(p)))
> points(k,p,pch=16,cex=2) # add the balls to top of spike