• No results found

Probability distributions

Data distributions I

3.3 Probability distributions

In theprevious sectionwe considered the probability that a particular outcome or event will occur when a die is rolled. A single roll of a die has only six possible outcomes. What do we do if our data consists of many events or, as we usually prefer to call them, values– is it possible to assign a probability to each value observed and is it sensible to do this? For illustrative purposes we consider a computer based‘experiment’ which consists of generating one thousand ran-dom numbers6with values between 0 and 1.Table 3.1shows thefirst hundred random numbers.

In contrast to the situation in which a die is rolled, we do not know the probability that a particular value appearing in table 3.1 will occur.7 Is it possible to establish the probability of obtaining a random number, x, between,

4 P(A and B) is often written as P(AB).

5 Adler and Roessler (1972) discuss other rules of probability.

6 Random numbers between 0 and 1 can be generated using the RAN function found on many calculators such as those manufactured by CASIO. The RAND() function on Excel can also be used to generate random numbers.

7 Unless we know the details of the algorithm that produced the random numbers.

3.3 PROBABILITY DISTRIBUTIONS 93

say, 0.045 and 0.621, using the data that appear intable 3.1? A good starting point is to view the values in picture form to establish whether any values appear to occur more frequently than others. The histogram shown infigure 3.1 indicates the frequency of occurrence of the 1000 random numbers.

The histogram provides evidence to support the following statements.

 All values of x lie in the range 0 to 1.

 The number of values (i.e. the frequency) appearing in any given interval, say from 0.0 to 0.1, is about the same as that appearing in any other interval of the same width.

The histogram infigure 3.1can be transformed into a probability distribution.

This allows the probability of obtaining a random number in any interval to be Table 3.1. One hundred random numbers in the interval 0 to1.

0.632 0.328 0.696 0.166 0.665 0.157 0.010 0.391 0.454 0.396

0.322 0.454 0.087 0.540 0.603 0.138 0.021 0.203 0.272 0.763

0.055 0.095 0.410 0.422 0.109 0.713 0.834 0.029 0.577 0.984

0.575 0.932 0.772 0.043 0.464 0.112 0.234 0.062 0.657 0.839

0.600 0.894 0.421 0.186 0.213 0.676 0.504 0.028 0.916 0.809

0.798 0.841 0.927 0.335 0.505 0.549 0.352 0.430 0.984 0.853

0.803 0.302 0.389 0.814 0.175 0.309 0.607 0.198 0.569 0.177

0.711 0.445 0.279 0.091 0.469 0.572 0.719 0.901 0.993 0.034

0.571 0.277 0.345 0.119 0.688 0.512 0.437 0.141 0.903 0.453

0.048 0.597 0.532 0.864 0.936 0.040 0.553 0.129 0.077 0.706

120 100 80 60 40 20 0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Random number, x

Frequency

Figure 3.1.Histogram of 1000 random numbers.

calculated. First, the bars infigure 3.1are merged together and the frequencies scaled so that total area under the curve created by merging the bars is 1. That is, for this distribution, the probability is 1 that x lies between 0 and 1. Taking the length of each bar to be the same, the graph shown infigure 3.2is created.

In going from figure 3.1 to3.2we have made quite an important step:

Figure 3.1shows the distribution of the 1000 random numbers. These numbers may be regarded as a sample of all the random numbers that could have been generated. The number of values that could have been generated is infinite in size, so the population is, in principle, infinite. By contrast, the graph infigure 3.2 allows us to calculate the probability of obtaining a value of x within a particular interval. What we have done by innocently smoothing out the‘imperfections’ in the histogram shown infigure 3.1prior to scaling, is to infer that the population of random numbers behaves in this manner, i.e. it consists of values that are evenly spread between 0 and 1. Successfully inferring something about a population by considering a sample consisting of values drawn from that population is a central goal in data analysis. Note that the label on the vertical axis infigure 3.2differs from that infigure 3.1. The vertical axis is labelled f(x), where f(x) is referred to as the probability density function, or pdf for short.

As the area total under any f(x) versus x curve is 1 and, in general, x may have any value between−∞ to +∞, we write

Z 1

1fðxÞdx ¼ 1: (3:1)

With reference to the graph infigure 3.2,

fðxÞ ¼ 0 for x50 and f ðxÞ ¼ 0 for x41; and f ðxÞ ¼ A for 0 x 1:

Usingequation 3.1and replacing f(x) by A gives 0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x

f(x)

A

Shaded area = 1

Figure 3.2.Uniform probability density function f(x) versus x.

3.3 PROBABILITY DISTRIBUTIONS 95

Z 1 0

Adx¼ 1;

therefore,

A x½ 10¼ A 1  0½ ¼ 1:

It follows that A = 1, so f(x) = 1 from x = 0 to x = 1.

A word of caution here: f(x) is not the probability of the occurrence of the value x. If that were the case then, as f(x) = 1 for all values of x between 0 and 1, we would conclude that the probability of observing, say, the number 0.476, is 1.

This is not so. It is the area under the curve that is interpreted as a probability. If we want to know the probability of observing a value between x1and x2, where x2> x1, we must integrate f(x) between these limits,8i.e.

P xð 1 x x2Þ ¼ Z x2

x1

fðxÞdx: (3:2)

For this probability distribution, f(x) is 1 between x = 0 and x = 1, and the relationship emerges

P xð 1 x x2Þ ¼Z x2

x1

1:dx ¼ x2 x1: (3:3)

Example 2

What is the probability of observing a random number, generated in the manner described in this section, between 0.045 and 0.621?

ANSWER

Substituting x2= 0.621 and x1= 0.045, intoequation 3.3gives P 0ð :045 x 0:621Þ ¼ 0:621  0:045 ¼ 0:576:

In performing the integration given byequation 3.3, the implication is that the variable, x, represents values of a continuous quantity.9That is, x can take on any value between the limits x1and x2. Mass, length and time are quantities regarded as continuous10but some values measured through experiment are discrete. An example is the counting of

8 Read P xð 1 x x2Þ as ‘the probability that x lies between the limits x1and x2’.

9 Here x is referred to as a continuous random variable, or crv.

10 We can argue (with justification) that on an atomic scale, mass and length are no longer continuous, however few measurements we make in the laboratory are likely to be sensitive to the

‘graininess’ of matter on this scale.

particles emerging from a radioactive substance. In this case the recorded values must be a whole number of counts. The probability distributions used to describe the distribution of discrete quantities differ from those that describe continuous quantities. For the moment we focus on continuous distributions as we tend to encounter these more often than discrete distributions in the physical sciences.

The probability distribution given byfigure 3.2and represented mathematically by equation 3.3 is convenient for illustrating some of the basic features common to all probability distributions. Few scientific experiments generate data that are evenly distrib-uted as shown infigure 3.1, meaning that the probability distribution infigure 3.2is more of illustrative, rather than practical, value. Insection 3.4we consider data gathered in more conventional experiments and look at a probability distribution with more features in common with real data distributions.

Exercise A

(1) (a) A particular probability density function is written f(x) = Ax for the range 0≤ x ≤ 4 and f(x) = 0 outside this range.

(i) Sketch a graph of f(x) versus x.

(ii) Useequation 3.1tofind A.

(iii) Calculate the probability that x lies between x = 3 and x = 4.

(iv) If 100000 measurements are made, how many values would you expect to lie between x = 3 and x = 4, assuming this probability function to be valid?

(b) Another probability density function is written f(x) = Ax2for the range−4 ≤ x ≤ 4 and f(x) = 0 outside this range. Repeat parts (i) through to (iv) from part (a) for this new function.

(2)

The exponential probability density function11may be written fðxÞ ¼ leðλxÞ; where l40 and x 0:

(i) Show that, forλ > 0,R1

0 fðxÞdx ¼ 1.

(ii) Ifλ = 0.3, calculate the probability that x lies between x = 0 and x = 2.

(iii) Calculate the probability that x > 2.

11 This distribution is sometimes used to predict the lifetime of electrical and mechanical components. For details see Hayter (2006).

3.3 PROBABILITY DISTRIBUTIONS 97

3.3.1 Limits in probability calculations

When calculating a probability using a probability density function, does it matter whether the limits of the interval, x1and x2, are included in the calcu-lation? In other words, does P xð 1 x x2Þ differ from P xð 15x5x2Þ? The answer to this is no, and to justify this we can write

P xð 1 x x2Þ ¼ P xð Þ þ P x1 ð 15x5x2Þ þ P xð Þ:2 (3:4) P(x1) is the area under the probability curve at x = x1. Usingequation 3.2, this probability is written

P xð Þ ¼ P x1 ð 1 x x1Þ ¼ Z x1

x1

fðxÞdx: (3:5)

Both limits of the integral are x1and so the right hand side ofequation 3.5must be equal to zero. Therefore, P(x1) = P(x2) = P(x) = 0, andequation 3.4becomes,

P xð 1 x x2Þ ¼ P xð 15x5x2Þ: (3:6)

Equation 3.6is valid for continuous random variables.

The fact that P(x) = 0 can be unsettling. Let us return to table 3.1 and consider thefirst value, x = 0.632. If we are dealing with the distribution of a random variable, then P(0.632) = 0. However, the value 0.632 has been observed, so how are we able to reconcile this with the fact that P(0.632) = 0? The explanation is that x = 0.632 is a rounded value12and the‘actual’ value could lie anywhere between x = 0.63150 and x = 0.63250. That is, though it is not obvious, we are dealing with an implied range by the way the number is written. If we now ask what is the probability that a value of x (for this distribution) lies between x = 0.63150 and x = 0.63250, we have, by usingequation 3.3,

P xð 1 x x2Þ ¼ 0:63250  0:63150 ¼ 0:001:

If 1000 measurements are made, then the number of times 0.632 is expected to occur is 1000 × 0.001 = 1.