6.8 Central Limit Central Limit Theorem and Theorem and Sample Sizes Sample Sizes
An important assumption in inferential analysis is that thesampling distribution of the sample statistic (mean or proportion) isnormally distributed . This will always be the case when the sample is drawn from a normally distributed population regardless of the sample size chosen. However, if the underlying population from which a sample is drawnis not normally distributed (or its distribution is unknown) then – as proven by thecentral limitcentral limit theorem
theorem – provided the sample siz e is large enough (usuallyn ≥ 30) the sampling distribution of the mean (or proportion) can be assumed to be normal.
The central limit theoremcentral limit theorem states that – regardless of the shape of the underlying population from which the sample is drawn – as the sample size (n) increases, the sampling distribution of the mean (or proportion) approaches the normal distribution. This is illustrated inFigure 6.3Figure 6.3.
Population
Sampling distributions oftributions of _ x
Figure 6.3 Central limit theorem – sampling distribution related to population distribution
Applied Business Statistics
The central limit theorem is fundamental to inferential statistics as it allows for valid inferential conclusions to be drawn about population parameter values (e.g.µ =k) or relationships between measures in a population (e.g.µ1 =µ2) based on only a single sample’s evidence drawn from any shaped underlying population. The reason for this is because the relevant sample statistic (e.g. _ x, ( _ x1 – _ x2)) can be assumed to behavenormally.
6.9 Summary 6.9 Summary
This chapter introduced the building blocks of inferential statistics, which will be covered further in chapters 7 to 11. The need for inferential statistical methods arises because most statistical data is gathered from a sample instead of from the population as a whole.
The different types of sampling methods, such as non-probability and probability sampling methods, were reviewed.
An understanding of the different sampling procedures assists with deciding when inferential statistical methods can be applied validly to sample data.
The concept of the sampling distribution was introduced and described for four sample statistics, namely: the single sample mean, the single sample proportion, the difference between two sample means, and the difference between two sample proportions. The sampling distribution describes, in probability terms, how close a sample statistic can lie to its corresponding population parameter. This relationship provides the basis for inferential statistics. The importance of thecentral limit theorem to ensure normality is highlighted.
Chapter 6 – Sampling and Sampling Distributions
Exercises Exercises
1
1 What is the purpose of inferential statistics?
2
2 There are four pillars of inferential statistics. Two of them are (i) descriptive statistics and (ii) probability concepts. Name the other two pillars of inferential statistics.
3
3 Fill in the missing word: ‘A sample is a ……… of a population’.
4
4 Fill in the missing word: ‘To produce reliable and valid estimates of a population parameter, a sample must be ……… of its target population’.
5
5 Explain the difference between non-probability and probability sampling methods.
6
6 Which sampling method is appropriate for inferential analysis? Why?
7
7 Name the two disadvantages of non-probability sampling.
8
8 Fill in the missing word: ‘Probability-based sampling involves selecting the sample members from the target population on a purely ……… basis’.
9
9 Fill in the missing word: ‘In a simple random sample, each member in the target population has an ……… chance of being selected’.
10
10 A wine farmer in Paarl wishes to test the fermentable sugar content (fructose) in a vineyard of chardonnay grapes. All grapes are expected to produce similar fructose levels.
Which random sampling technique would be appropriate to produce a representative sample of bunches of grapes to test the fruct ose levels of this vineyard?
11
11 In a paper milling plant in Sabie, the consistency of the rolled paper is tested at regular hourly intervals. In a given shift, a sample of paper is tested for consistency at a randomly chosen point in time within the first hour after the machine has reached a ‘steady-state’. Therefore a sample is tested every half-hour after the first sample was drawn. This method of sampling is known as ……… ……… sampling. (two words)
12
12 The BEEE status of a company is determined by its size based on turnover (i.e.
turnover under R5 million; R5 to R10 million and above R10 million per annum).
To verify BEEE compliance across all companies, an appropriate sampling method to use to draw a representatives sample would be ……… ……… sampling.
(two words) 13
13 The Department of National Education wishes to assess the accounting skills of grade 10 learners in the 92 schools in the Johannesburg metropolitan area that offer this subject. The Department’s research staff believe that the range of abilities of grade 10 learners’ accounting skills is likely to be the same across all schools (i.e.
each school is a cluster). If a sample of only 15 of these schools is randomly selected (with a further sub-sample of grade 10 accounting learners selected within each school and tested), then the sampling method used is called ……… ………
……… . (three words) 14
14 Name one advantage of stratified random sampling.
15
15 A sampling distribution describes the relationship between a ……… ………
(two words) and its corresponding ……… ……… . (two words) 16
16 Fill in the missing word: ‘The standard deviation of all sample means around it population mean is called the ……… ………’. (two words)
17
17 What percentage of sample means lie within two standard errors of the true, but unknown population mean for a given numeric random variable.
Applied Business Statistics
18
18 Fill in the missing word: ‘The shape of the distribution of all sample means based on a given sample size around its population mean is ………’.
19
19 If the shape of the population distribution of a variable is non-normal or unknown, how large a sample must be drawn from this population to be sure that the sampling distribution of the sample mean will be normally distribution?
20
20 Name the theorem that allow statisticians to assume that the sampling distribution of a sample mean approaches a normal distribution as the sample size increases?
21
21 What does the term ‘sampling error’ mean?