Rationale of a Sampling Distribution
Assume that the Department of Labour wants to determine the average number of years’
experience of all professional engineers in South Africa. Consider the hypothetical situation of drawing every possible sample of size n (sayn = 100 professional engineers) from this population. Assume that there arek such samples. Calculate the mean years of experience for each of thesek samples. There will now bek sample means.
Chapter 6 – Sampling and Sampling Distributions
Now construct a frequency distribution of thesek sample means and also calculate the mean and standard deviation of thesek sample means. The following properties w ill emerge:
(a) The sample mean is itself arandom variable, as the value of each sample mean is likely to vary from sample to sample. Each separate sample of 100 engineers will have a different sample mean number of years’ experience.
(b) Themean of all thesek sample means will be equal to thetrue population mean,µ.
called thestandarstandard error of d error of the sample meanthe sample meanssand is calculated as follows:
σx_ = ___ σ
√ __
n 6.16.1
It measures the average deviation of sample means about its true population mean.
(d) The histogram of thesek sample means will benormally distributed.
This distribution of the sample means is called thesampling distributionsampling distribution of _ x.
To summarise, the sample mean is a random variable that has the following three properties:
It is normally distributed.
It has a mean equal to the population mean,µ.
It has a standard deviation, called the standard error,σx_, equal to ___ σ
68.3% of all sample means will lie within one standard error of its population mean.
95.5% of all sample means will lie within two standard errors of its population mean.
99.7% of all sample means will lie within three standard error of its population mean.
Alternatively, it can be stated as follows:
There is a 68.3% chance that a single sample mean will lieno further than one standard error away from its population mean.
There is a 95.5% chance that a single sample mean will lieno further than two standard errors away from its population mean.
There is a 99.7% chance that a single sample mean will lieno further than threestandard errors away from its population mean.
This implies that any sample mean which is calculated from a randomly drawn sample has a high probability (up to 99.7%) of beingno more than three standard errors away from its true, but unknown, population mean value.
These probabilities are found by relating the sampling distribution of _ x to the z-distribution. Any sample mean, _ x, can be converted into az-value through the following z transformation formulae:
Applied Business Statistics
Thesampling distribution of the sample mean is shown graphically in Figure 6.1.
68.3%
_ x
±1σ 95.5%
99.7%
x
±2σ
x
±3σx σ = ___ σ
√ __
n x
µ
Figure 6.1
Figure 6.1 Sampling distribution of the sample mean
This relationship between the sample mean and its population mean can be used to:
find probabilities that a single sample mean will lie within a specified distance of its true but unknown population mean
calculate probability-based interval estimates of the population mean
test claims/statistical hypotheses about a value for the true but unknown population mean.
The sampling distribution is the basis for the two inferential techniques ofconfidence intervals andhypotheses tests, which are covered in the following chapters.
6.5
6.5 The The Sampling Sampling Distributi Distribution on of of the the Sample Sample Proportion Proportion (( p ))
The sample proportion, p, is used as the central location measure when the random variable under study isqualitative and the data iscategorical (i.e. nominal/ordinal).
A sample proportion is found by counting the number of cases that have the characteristic of interest,r , and expressing it as a ratio (or percentage) of the sample size,n (i.e. p =
_
nr ). See Table 6.3 for illustrations of sample proportions.Table 6.3
Table 6.3 Illustrations of sample proportions for categorical variables Qu
Qualaliitatatitivve e rarandndom om vavaririababllee SaSampmplle e ststatatisistiticc Gender
Trade union membership Mobile phone brand preference
Proportion of females in a sample of students Proportion of employees who are trade union members
Proportion of mobile phone users who prefer Nokia This section will show how the sample proportion, p, is related to the true, but unknown population proportionπ for any categorical random variablex.
The sample proportion, p, is related to its population proportion, π, in exactly the same way as the sample mean, _ x, is related to its population mean,µ. Thus the relationship between p andπ can be described by thesampling distrisampling distribution of bution of the sample proporthe sample proportiontion.
Chapter 6 – Sampling and Sampling Distributions
This relationship can be summarised as follows for a givencategorical random variable,x. (a) The sample proportion, p, is itself a random variable as the value of each sample
proportion is likely to vary from sample to sample.
(b) Themean of all sample proportions is equal to itstrue population proportion,π.
(c) Thestandard deviation of all sample proportions is a measure of the sampling error . It is called thestandarstandard error of d error of sample proposample proportionsrtions and is calculated as follows:
σ p =
√
_______ _______ π(1 –n π) 6.36.3It measures the average deviation of sample proportions about the true population proportion.
(d) The histogram of all these sample proportions isnormally distributed.
Based on these properties and using normal distribution theory, it is possible to conclude the following about how sample proportions behave in relation to their population proportion:
68.3% of all sample proportions will lie withinone standard error of its population proportion,π.
95.5% of all sample proportions will lie withintwo standard errors of its population proportion,π.
99.7% of all sample proportions will lie within three standard error of its population proportion,π.
These probabilities are found by relating the sampling distribution of p to thez-distribution.
Any sample proportion, p, can be converted into a z-value through the following z transformation formulae:
z = ______ pσ –π
p or z = ________ p –π
√
_______ _______ π(1 – π)n 6.46.4 Figure 6.2 shows the sampling distribution of single sample proportions graphically.68.3%
95.5%
99.7%
±1σ p
±2σ p
±3σ p
π p
σ p =
√
_______ ______ π(1 –nπ)Figure 6.2
Figure 6.2 Sampling distribution of sample proportions ( p)
This relationship can now be used to derive probabilities, develop probability-based estimates, and test hypotheses of the population proportion in statistical inference.
Applied Business Statistics