Statistics For Social Sciences – MATH1208
Unit 7 – Sampling Theory
Introduction
There are different ways for collecting data and choosing the subjects for an
investigation. Primary data is the name given to data that are used for specific
purpose for which they were collected. They will contain no unknown quantities in
respect of method of collection, accuracy of measurements or which members of
population were investigated. Sources of
than that for which they were originally collected. Summaries and analyses of such data are sometimes referred to as secondary statistics. The main sources of secondary data mainly include publications.
Census
A census is the name given to a survey which examines every member of a
population. Three government censuses are population census, census of distribution and census of production.
A Population Census is taken every ten
years, to obtain information such as age, sex, relationship to head of household,
car for travel and number of rooms in place of dwelling.
A Census Distribution is taken every five years, covering virtually all retail
establishments and some wholesalers. It obtains information on number of
employees, type of hoods sold, turnover and classification.
stocks of raw materials, finished goods and expenditure on plant and machinery.
Bias
Bias can be defined as the tendency of a pattern of errors to influence data in an
unrepresentative way. The errors involved in the results of investigations that have been subject to bias are known as systematic errors.
The main types of bias are:
1. Selection bias. This can occur if a sample is not truly representative of the population. Note that censuses cannot be subject to this type of bias. For example, sampling the
particular day may not adequately represent the nature and quality of the goods that
customers receive. Some factors that could influence the results include – this machine might be manned by more or less
experienced operators; there may be other machines that perform better or worse; the day’s production may be under more or less pressure than another day.
2. Structure and wording bias. This could be obtained from badly worded questions. For example, technical words might not be
3. Interview bias. If the subject of an
investigation are personally interviewed, the interviewer might project bias opinions or an attitude that might not gain the full
cooperation of the subjects.
4. Recording bias. This could result from badly recorded answers or clerical errors made by an untrained workforce.
Probability Sampling
A probability sampling method is any
method of sampling that utilizes some form of random selection. In order to have a
the different units in your population have equal probabilities of being chosen.
Non-probability Sampling
A core characteristic of non-probability sampling techniques is that samples are selected based on the subjective judgement of the researcher, rather than random
selection (i.e., probabilistic methods), which is the cornerstone of probability sampling techniques.
Most information obtained by an
organization about any population is as a result of examining a small, representative subset of the population. This is called a sample.
Sample theory is a study of relationships existing between a population and samples drawn from the population. It is useful in estimating unknown population quantities such as population mean and variance, often called population parameters, from a
knowledge of corresponding sample
Sampling Frame
A sampling frame is a list of all members of the population. Certain Sampling methods require each member of the population
under consideration to be known and
identifiable. The structure which supports this identification is called a sampling
frame. Some sampling methods require a sampling frame only as a listing of the population. Other methods need certain characteristics of each member also to be known.
Once you have the sampling frame and you have determined your sample size, you are ready to select your sample. Although it is easy to understand the process of placing the name of each member of the population in a hat and select the sample by picking names from the hat, this is not a practical way to select the sample. To imitate this process you can use a table of random numbers. A table of random numbers is a list of
numbers randomly generated and listed in the order in which they are generated.
arranged in groups for reading convenience. The term ‘generated in a random fashion’ can be interpreted as ‘the chance of any one digit occurring in any position in the table is no more or less than the chance of any other digit occurring’.
To use the random number table to identify which members of our population will be selected for the sample, we first assign each member of the population an identification (ID) number. An example of a simple
random sample is as follows.
Each student at your college has a mailbox on campus. The mailboxes are numbered from 0000 to 9000. To select a simple random sample of ten students, we can
select ten mailbox numbers at random using the random number table. You can close
your eyes and chose a spot on the random number table. Suppose you select row 7, column 3 of the table. The first student
selected has mailbox 2419, which is a valid number. If we continue to read off four-digit numbers from the table, the second number selected is for mailbox 0210. The list of all ten mailbox numbers selected is: 2419 0210 7750 4293 6279 4778 1976
table are organized in nine-digit blocks and we need only four-digit mailbox numbers, we just keep reading the numbers
Methods of Obtaining Samples for Statistical Analysis
The sampling techniques most commonly used in business and commerce can be split into three categories.
1. Random sampling. This ensures that each and every member of the population under consideration has an equal chance of being selected as part of the sample. Two types of random sampling used are:
i. Simple random sampling
This sampling ensures that each member of the population has an equal chance of being chosen for the sample. It is therefore
least lists all the members of the target
population. Examples of where this method might be used are:
a. by a large company, to sample 10% of their orders to determine their average value; b. by a professional association, to sample a proportion of its members to determine their views on a possible amalgamation.
Advantages
The selection of the sample members is unbiased and generally accepted by the layman that the method is fair.
Disadvantages
the need for each chosen subject to be located and questioned is time
consuming
the chance that certain significant
attributes of the population are under or over represented.
ii. Stratified random sampling
Stratified random sampling extends the idea of simple random sampling to ensure that a heterogeneous population has its defined strata levels taken account of in the sample. For example, if 10% of all heavy goods
investigation in hand, then 10% of a sample of such vehicles must have the safety
feature. The general procedure for taking a stratified sample is:
a. Stratify the population, defining a number of separate partitions.
b. Calculate the proportion of the population lying in each partition.
c. Split the total sample size up into the above proportions.
d. Take a separate sample (normally simple random) from each partition, using the
sample sizes as defined in c.
Advantage
The sample itself is free from bias, since it takes into account significant strata levels (attributes) of a population considered
important to the investigation. Disadvantage
a. an extensive sampling frame is necessary; b. strata levels of importance can only be selected subjectively;
c. increased costs due to the extra time and manpower necessary for the organization and implementation of the sample.
i. identifies certain attributes (or strata
levels) that are considered significant to the investigation at hand;
ii. partitions the population accordingly into groups which each have a unique
combination of these levels.
2. Quasi-random sampling. Quasi means ‘almost’ or ‘nearly’. This type of technique, while not satisfying the criterion given in 1 above, is generally thought to be as
expensive to consider. Two types that are commonly used are:
i. Systematic random sampling. Systematic sampling is a type of probability sampling method in
which sample members from a larger population are selected according to a
random starting point and a fixed periodic interval. This interval, called
the sampling interval, is calculated by dividing the population size by the
desired sample size.
Systematic sampling is a method of sampling that can be used where the
or the fleet of company vehicles) or some of it is physically in evidence (such as a row of houses, items coming of a production line or customers leaving a supermarket). The
technique is to choose a random starting place and then systematically sample every 40th or 12th or 165th item in the population.
The number chosen is based on the sample required. For example, if 2% of the sample was needed from the population, every 50th
item would be selected, after starting at
some random point. This is because 2% = 2 out of 100 = 1 out of 50.
uniform. These are referred to as
homogeneous population. For example, the invoices of a company for one financial year would be considered as a homogeneous
population by an auditor, if their value or type of goods ordered was no consequence to the investigation.
Advantage of this method include:
i. Ease of use;
ii. the fact that it can be used where no sampling frame exists, but items are physically in evidence.
a. The main disadvantage of systematic
sampling is that bias can occur if recurring sets in the population are possible.
b. This method of sampling is not truly
random, since once a random starting point is selected all the subjects are
pre-determined. Hence, the use of the term ‘quasi-random’ to describe the technique. ii. Multi-stage sampling
Where a population is spread over a
relatively wide geographical area, random sampling will almost certainly entail
Multi-stage sampling is intended to overcome this particular problem. It involves:
a. Splitting the area into a number of regions;
b. Randomly selecting a small number of regions;
c. Confining sub-samples to these regions alone, with the size of each sub-sample proportional to the size of the area. For example, United Kingdom could be split into countries or a large city could be split into postal districts;
Once the final regions or sub-regions is
selected, the final sampling technique could be simple or stratified random or systematic, depending on the existence or otherwise of a sampling frame.
Advantage
The main advantage of this method is that less time and manpower is needed and thus it is cheaper than random sampling.
Disadvantages of multi-stage sampling include:
a. possible bias if a very small number of regions is selected;
been selected, no member of the population in any other region can be selected.
3. Non-random sampling. This is used when neither of the above techniques are possible or practical. Two well-used types are:
i. Cluster sampling.
Cluster sampling is a non-random sampling method which can be employed where no sampling frame exists, and, often for a
For example, suppose a survey was needed of companies in South Wales who uses a computerized payroll. First, three or four small area would be chosen (perhaps two of these based in city centres and one or two in outlying areas. Each company, in each area, might then be phoned to identify which of them have computerized systems. The
survey itself could then be carried out. Advantages
i. it is a good alternative to multi-stage sampling where no sampling frame exists; ii. it is generally cheaper than other
Disadvantage
The main disadvantage of the method is the fact that sampling is not random and thus selection bias could be significant.
ii. Quota sampling.
The Quota Sampling is yet another
non-probability sampling method wherein the population is divided into a mutually
exclusive, sub-groups from which the
sample items are selected on the basis of a given proportion.
Simply, Quota Sampling is a form of
knowledge and professional judgment. In this method, first of all, the quotas, i.e. a proportion in which the sample items are to be selected is set up and then within the
quotas the choice of sample items depends exclusively on the investigator’s judgment.
For Example, Suppose an interviewer is
told to interview 250 people living in certain geographical areas. Out of which 100 males, 100 females and 50 children are to be
interviewed. Within these quotas, the interviewer can select any person on the basis of his personal judgment.
chance of personal prejudice or bias of the investigator that can adversely affect the credibility of the results. Such as, if the interviewer finds children insufficient to
answer the questions, then he might ask their mothers to give answers on their behalf.
Thus, this may tamper the results, and the purpose of research gets unfulfilled.
The sampling technique mostly favoured in market research is quota sampling. The
method uses a team of interviewers, each with a set number (quota) of subjects to interview. Normally the population is
lot of responsibility on the interviewer’s
since the selection of subjects is left to them entirely. Ideally they should be well trained and have a responsible, professional attitude. Advantage of Quota sampling include:
i. stratification of the population is usual, although not essential;
ii. it is not complicated, any member can be replaced by another member with the same characteristics, no non-response;
iii. low cost and convenience. Disadvantage
ii. severe interviewer bias can be introduced into the survey by inexperienced or
untrained interviewers, since all the data collection and recording rests with them. Convenience sampling (also known
as grab sampling, accidental sampling, or opportunity sampling) is a type of non-probability sampling that involves
the sample being drawn from that part of the population that is close to hand. This type of sampling is most useful for pilot testing.
In sociology and statistics
research, snowball sampling (or
referral sampling) is a
non-probability sampling technique where existing study subjects recruit future
subjects from among their acquaintances.
Advantages of Snowball Sampling. The chain referral process allows the researcher to reach populations that are difficult to sample when using other
sampling methods. The process is cheap, simple and cost-efficient. This sampling technique needs little planning and fewer workforce compared to other sampling techniques.
It us usually impossible to determine the sampling error or make inferences about populations based on the obtained sample.
Apply Central Limit Theorem to Large Samples
The Central Limit Theorem (CLT) applies for large enough sample sizes. A “large
normal distribution, then the results of the CLT hold even for small samples (n 30).
In random sampling from a population with mean ( ) and standard deviation ( ), when n is large enough, the distribution of
(point estimator) is approximately normal with a mean and standard error.
Estimate the Sample Mean and Sample Variance (Standard Error)
The standard error is the standard deviation of the sampling distribution of a point
An estimator is a statistic that estimates some fact about the population. You can also think of an estimator as the rule that
creates an estimate. For example, the sample mean(x̄) is an estimator for the population mean, μ. Point estimator and Interval
estimator are two types of estimators.
The quantity that is being estimated (i.e. the one you want to know) is called
the estimand. For example, let’s say you
wanted to know the average height of children in a certain school with a
is your sample mean, the estimator. You use the sample mean to estimate that the population mean (your estimand) is about 56 inches.
Point vs. Interval
Estimators can be a range of values (like a confidence interval) or a single value (like the standard deviation). When an estimator is a range of values, it’s called an interval
estimate. For the height example above, you
Characteristics of Estimators
Estimators can be described in several ways:
Biased: a statistic that is either an
overestimate or an underestimate.
Efficient: a statistic with small variances
(the one with the smallest possible variance is also called the
“best”). Inefficient estimators can give you good results as well, but they usually
requires much larger samples.
Invariant: statistics that are not easily
Shrinkage: a raw estimate that’s
improved by combining it with other information.
Sufficient: a statistic that estimates the
population parameter as well as if you knew all of the data in all possible
samples.
Unbiased: an accurate statistic that
neither underestimates nor overestimates.
Estimation
descriptor for the sample, it is called a point estimate. A point estimate is a single number calculated from sample data. It is used to
estimate a parameter of the population. A parameter is a numerical descriptor of the population.
Interval estimates indicate the precision, or
accuracy, of an estimate and are therefore preferable to point estimates. For example, if we say that a distance is measured as 5.28 metres (m), we are giving a point estimate. If, on the other hand, we say that the
Parameters are typically unknown. One
important problem of statistical inference is the estimation of the population parameters (such as population mean and variance)
from the corresponding sample statistics (such as sample mean and variance).
An interval estimator is a statistical
estimator which is represented geometrically as a set of points in the parameter space. An interval estimator can be seen as a set of
which this estimator will "cover" the
unknown parameter point. This probability, in general, depends on unknown parameters; therefore, as a characteristic of the reliability of an interval estimator a confidence
coefficient is used; this is the lowest possible value of the given probability. Interesting statistical conclusions can be drawn for only those interval estimators
which have a confidence coefficient close to one.
A point estimator is the formula or rule that
A point estimator is a statistical estimator whose value can be represented
geometrically in the form of a point in the same space as the values of the unknown parameters (the dimension of the space is equal to the number of parameters to be
estimated). In fact, point estimators are also used as approximate values for unknown physical variables. For the sake of
simplicity, it is further supposed that one natural parameter is subject to estimation; in this case, a point estimator is a function of the results of observations, and takes
numerical values.
parameter being estimated, i.e. if the
statistical estimation is free of systematic errors. The arithmetical mean (1) is an unbiased statistical estimator for the
mathematical expectation of identically-distributed random variables (not
necessarily normal).
An unbiased estimator yields an estimate that is fair. It neither systematically
overestimates the parameter nor
systematically underestimates the parameter. Properties of an estimator
samples of a given size is equal to the parameter being estimated.
2. Consistent – as the sample size increases, the value of the estimator approaches the value of the parameter estimated.
3. Relatively efficient – of all the statistics that can be used to estimate a parameter, the relatively efficient estimator has the smallest variance.
Calculate Standard Error of a Sample
The standard deviation of a sampling
distribution of a statistics is often called its standard error. The standard error of the
= where is the sample mean. This is true for large or small samples. The
sampling distribution of means is very nearly normal for N 30 even when the population is non-normal.
Determine Confidence Intervals for Population Means
A confidence interval (also called an interval estimate) takes the point estimate a step
further and gives a range of values and a probability. The probability value is the
likelihood that an interval actually includes the value of the unknown population
Given a random sample from some
population, a confidence interval for the
unknown population mean is where
is the sample mean, s is the sample
standard deviation, n is the sample size and z = confidence factor (1.64 for 90%; 1.96 for 95%; 2.58 for 99%).
Example
A sample of 100 invoices yielded a mean gross value of $45.50 and standard deviation of $3.24. Calculate a 95% confidence
interval.
= 45.50 (1.96) = 45.50 0.635.
There is a 95% probability that the mean of the complete population of invoices from which the sample was taken is between 44.9 and 46.1.
In statistics, the 68–95–99.7 rule is a
shorthand used to remember the percentage of values that lie within a band around
the mean in a normal distribution with a width of two, four and six standard
deviations, respectively; more accurately, 68.27%, 95.45% and 99.73% of the values lie within one, two and three standard
expressed as follows, where X is an observation from a normally
distributed random variable, μ is the mean of the distribution, and σ is its standard
deviation:
In the empirical sciences the
so-called three-sigma rule of thumb expresses a conventional heuristic that "nearly all"
values are taken to lie within three standard deviations of the mean, i.e. that it is
empirically useful to treat 99.7% probability as "near certainty".[1] The usefulness of this
heuristic of course depends significantly on the question under consideration, and there are other conventions, e.g. in the social
"significant" if its confidence level is of the order of a two-sigma effect (95%), while in particle physics, there is a convention of a five-sigma effect (99.99994% confidence) being required to qualify as a "discovery".
A hypothesis is an idea, an assumption (or guess), or a theory about the characteristics of one or more variables in one or more
populations. Once a hypothesis is formed, we must test it. We must decide whether or not to believe the hypothesis.
A hypothesis test is done, by using the information in the sample data to decide whether or not to believe the hypothesis.
The hypothesis test is a statistical procedure that involves formulating a hypothesis and using sample data to decide on the validity of the hypothesis.
One of the first steps in carrying out the
views. One is called the null hypothesis (H0)
and the other is the alternative hypothesis (H1).
Null Hypothesis(H0) for a Test
The null hypothesis is a statement about a parameter of the population(s). It is labelled H0. In many instances we formulate a
statistical hypothesis for the sole purpose of rejecting or nullifying it. For example, if we want to decide whether a given coin is
biased, we formulate the hypothesis that ‘the coin is fair’ (i.e., p = 0.5, where p is the
than another, we formulate the hypothesis that ‘there is no difference between the procedures (i.e., any observed differences are due merely to fluctuation in sampling from the same population). Such hypotheses are called null hypotheses (H0).
Alternative Hypothesis (H1) for a Test
The alternative hypothesis is a statement
about a parameter of the population(s) that is opposite to the null hypothesis. It is labeled H1 or HA. Any hypothesis that differs from a
given hypothesis is called an alternative hypothesis. For example, if one hypothesis (H0) is p = 0.5, alternative hypotheses might
Type I and Type II Errors
If we reject a hypothesis when it should be accepted, we say that a ‘Type I’ error has been made. If, on the other hand, we accept a hypothesis when it should be rejected, we say that a ‘Type II’ error has been made. In either case, a wrong decision or judgment has occurred. In order for decision rules or tests of hypotheses to be good, they must be designed so as to minimize errors of
decision. This is not a simple matter,
because for any given sample, an attempt to decrease one type of error is generally
type of error. In practice, one type of error may be more serious than the other, and so a compromise should be reached in favour of limiting the more serious error. The only way to reduce both types of error is to
increase the sample size, which may or may not be possible.
Levels of Significance
level’ of the test. This probability, often
denoted by , is generally specified before any samples are drawn so that the results obtained will not influence our choice.
In practice, a significance level of 0.05 or 0.01 is customary, although other values are used. If, for example, the 0.05 (or 5%)
hypothesis has a 0.05 probability of being wrong.
Rejection Region(s) and Critical Value(s)
To perform the hypothesis test we need to choose between the null and the alternative hypotheses. We must decide to reject or not to reject the null hypothesis. The decision is always phrased in terms of the null
sufficiently inconsistent with the null hypothesis.
The sample consists of n observations. We must find a single number that captures the information in the sample. This number is called ‘test statistic’. A test statistic is a number that is used to decide between the null and alternative hypothesis.
Test statistic (z) = (sample mean – mean of the distribution) ÷ (standard deviation
÷ ) =
where n is the sample size, is the sample
mean, µ is the population mean and s is
The rejection range is the range of values of the test statistic that will lead us to reject the null hypothesis. It is defined by the critical value(s).
The second approach to deciding if we
Conduct Test for Large Samples – Normal Distribution
A large-sample test of the mean is
conducted when the characteristic of interest is the population mean, , and either of the following situation exists:
The population standard deviation is known (regardless of the sample size). OR
Suppose that under a given hypothesis the sampling distribution of a statistic S is a normal distribution with mean and
standard deviation . Thus the distribution of the standardized variable ( or z score),
given by z = , is the standardized
normal distribution
(mean = 0, variance = 1).
As indicated in the diagram above, we can be 95% confident that if the hypothesis is true, then the z score of an actual sample statistic S will lie between – 1.96 and 1.96,
0.95
0.025 0.025
z = – 1.96 z = 1.96 Critical
region
since the area under the normal curve
between these values is 0.95. However, if on choosing a single sample at random we find that the z score of it, lies outside the range – 1.96 to 1.96, we would conclude that such an event could happen with probability of 0.05 (the total shaded area in the figure) if the given hypothesis were true. We would then say that this z score differed
significantly from what would be expected under the hypothesis, and we would then be inclined to reject the hypothesis.
hypothesis (i.e., the probability of making a Type I error). Thus we say that the
hypothesis is rejected at 0.05 the
significance level or that the z score of the given sample statistic is significant at the 0.05 level.
The set of z scores outside the range – 1.96 to 1.96 constitutes what is called the critical region of the hypothesis, the region of
rejection of the hypothesis, or the region of significance. The set of z scores inside the range – 1.96 to 1.96 is thus called the region of acceptance of the hypothesis, or the
On the basis of the above remarks, we can formulate the following decision rule (or test of hypothesis or significance):
Reject the hypothesis at the 0.05
significance level if the z score of the
statistic S lies outside the range – 1.96 to 1.96 (i.e. either z > 1.96 or z < – 1.96). This is equivalent to saying that the
observed sample statistic is significant at the 0.05 level.
Accept the hypothesis otherwise (or, if desired, make no decision at all).
The z score is also called test statistic
that other significance levels could be used. For example, if 0.01 level were used, we would replace 1.96 everywhere above with 2.58.
Classify Hypothesis Tests Into One-tailed Tests and Two-tailed Tests
In the above test we were interested in extreme values of the statistic S or its
corresponding z score on both sides of the mean (i.e., in both tails of the distribution). Such tests are called sided tests or two-tailed tests.
We may be interested in only extreme
tail distribution). For example, testing the hypothesis that one process is better than another, which is different from testing
whether one process is better or worse than the other. Such tests are called one-sided tests or one tailed tests. In such cases the critical region is a region to one side of the distribution, with area equal to the level of significance.
The table below gives critical values of z for both one-tailed and two-tailed tests at
various levels of significance.
A two-tailed test of the population mean has these null and alternative hypotheses:
H1: [a specific number] Level of
significance,
0.10 0.05 0.01 0.005 0.002
Critical values of z for one-tailed tests – 1.28 or 1.28 – 1.645 or 1.645 – 2.33 or 2.33 – 2.58 or 2.58 – 2.88 or 2.88 Critical values of z for two-tailed tests – 1.645 and 1.645 – 1.96 and 1.96 – 2.58 and 2.58 – 2.81 and 2.81 – 3.08 and 3.08
A small-sample test of the mean is
conducted when the characteristic of interest is the population mean, , and the
population standard deviation is unknown but the sample size, n, is less than or equal to 30.
Infer or Draw Conclusion about the Outcome of the Test
Introduction
https://www.youtube.com/watch? v=e8ptHgDzJtQ
https://www.youtube.com/watch?
annotation_id=annotation_3582407077&f eature=iv&src_vid=pEidoIu3GA0&v=bU 93aSJKMGw#t=3m34s
Confidence Interval for T-score
https://www.youtube.com/watch?
annotation_id=annotation_3635794553&f eature=iv&src_vid=5LFhu0vGzkI&v=U mAJJtEo6cQ
Confidence interval for Z-score
https://www.youtube.com/watch?
Two Tailed Test
https://www.youtube.com/watch? v=0XXT3bIY_pw
One tailed Test
https://www.youtube.com/watch? v=lNoxKsuJ6Xc
https://www.youtube.com/watch? v=lNoxKsuJ6Xc
https://www.youtube.com/watch?
eature=iv&src_vid=0XXT3bIY_pw&v=5 LFhu0vGzkI
Exercise
1. https://www.youtube.com/watch?
annotation_id=annotation_3898681119&f eature=iv&src_vid=lwpobQmUTd8&v=p EidoIu3GA0
2. An engineer hypothesizes that the mean number of defects can be decreased in a
a) Identify the Null (H0) and the alternative
hypothesis (H1). Ans: H0 = 18, H1 < 18
b.) What type of hypothesis test should be carried out to test this claim?
Ans: Left – tailed test
3. The random variable X is normally
distributed with standard deviation of 1.2. The null hypothesis H0 : µ = 12.5 cm for the
mean of this distribution, is being tested against the alternative H0 : µ ≠ 12.5 cm. A
sample size 36 turns up a sample mean of 12.3 cm. Calculate the test statistic.
Ans: Test statistic (z) = – 1
4. Given that the distribution of a
mean of 60 and a standard deviation of 3. What is the approximate percentage of data values that is expected to fall between 57 and 63? Unit 8, Page 7 to 8 Ans: 68%
5. The principal of a large community
college wishes to estimate the average age of the students presently enrolled. From past studies, the standard deviation is known to be 2 years. A sample of 100 students is
selected and the mean is found to be 23.2 years.
a) Find the 95% confidence interval of the population mean.
Hint (Unit 8, page 6): Given a random
interval for the unknown population mean is
where is the sample mean, s is the
sample standard deviation, n is the sample size and z = confidence factor (1.64 for 90%; 1.96 for 95%; 2.58 for 99%).
Ans: 22.8 < µ < 23.6
b) Explain your result in part (a).
Ans: There is a 95% probability ( or
chance) that the mean of the population from which the sample was taken lies between 22.8 and 23.6 (22.8 < µ < 23.6).
calls by cell phone users. A sample of 65 cell phone users indicated that the mean amount spent is $250, with a standard deviation $50.
a) Using a 95% level of confidence,
determine the confidence interval for the mean. Ans: 237.85 < µ < 262.15
b) Explain what part (a) indicates.
Ans: There is a 95% probability ( or
chance) that the mean of the population from which the sample was taken lies between 237.85 and 262.15
7. Given the sample size is 7, sample mean is 8 and the population deviation = 4.2. a) What is the standard error of the mean?
Ans: Standard error = 1.59
b) What is the 95% confidence interval?
Ans: (4.90, 11.10)
c) Explain your result in part (b).
Ans: There is a 95% probability ( or
chance) that the mean of the population from which the sample was taken lies between 4.9 and 11.1 (4.9 < µ < 11.1).
8. The attendance at the All Jam Swimming Meet was 400. A random sample of 50
the mean number of soft drinks consumed per person was 1.86 with a standard
deviation of 0.5.
a) Construct a 95% confidence interval for the mean number of soft drinks consumed per person. Ans: (1.72, 2.00)
b) Interpret your result in part (a).
Ans: There is a 95% probability ( or
chance) that the mean of the population from which the sample was taken lies between 4.9 and 11.1 (4.9 < µ < 11.1).
9. The lives of batteries used in digital
recently modified and a sample of 24 modified batteries was tested. It was
discovered that the mean life was 311 days, and the sample standard deviation was 11 days. At the 0.05 level of significance, can we claim that the modification changed the mean life of the battery? Explain.
Ans: Null Hypothesis (H0) : µ = 305
Alternate Hypothesis (H1) : µ ≠ 305
At the 0.05 level of significance, we reject
H0 if z < - 1.96 or z > 1.96 and accept H1
(draw diagram)
We reject H0 since 2.67 > 1.96 and accept
H1. We conclude that at the 5% level of
significance, there is evidence to suggest that the mean has changed due to the modification of the batteries.
10. A company suspect that the value of type A customer monthly orders has
changed from last year. Last year’s type A customer average monthly order was
$234.50. A random sample of 20 customers was taken, with a mean of $241.52 and
standard deviation $13.92.
a) Determine the test statistic at 0.05 significance level. Ans: 2.26
b) Is the difference significant? Explain.
Ans: There is evidence of a difference, since z = 2.26 lies outside of the range – 1.96 to 1.96. That is, there is evidence that the value of type A customer monthly orders has changed.
11. A manager is convinced that a new type of machine does not affect production at the company’s major shop floor. In order to test this, 12 samples of this week hourly output is taken and the average production per hour is measured as 1158 with a standard
1196 before the new machine was introduced.
a) Determine the test statistic at 0.05 significance level. Ans: -1.85
b) Is the difference significant? Explain.
Ans: There is no evidence of any difference between the sample and
population, since z = - 1.86 lies within the range – 1.96 to 1.96. That is, the
manager’s conviction is supported by the results.
12. Test at the 5% level whether a sample value of 52 could come from a normal
= 25. (Sample size not given and a two tailed test)
Ans: At the 5% level there is significant evidence to indicate that the value does not come from a population with a mean
of 40. That is we reject the view (H0) that
the mean is 40, since z = 2.4 > 1.96.
13. The length of a species of lizard is known to be normally distributed with
Ans: There is evidence to suggest that the lizard could be of the same species. We accept the null hypothesis since z = 1.265 which is less than 1.64.
14. Test at the 5% level whether the value 340 could be from a normal population with a mean of 320 and variance of 80, or
whether the mean is greater than 320. (Hint: One tailed test)
Ans: There is significant evidence to suggest the value is from a population
with a larger mean than 320. We reject H0
since, z = 2.236 > 1.64
15. A supplier claims that the mean life
A consumer organization tested 200 bulbs and found the mean to be 117.5 hours, with a variance of 169 h2. Is there evidence at the
1% level that the mean is lower than 120 hours? Explain. (Hint: One tailed test)
Ans: There is evidence to indicate that the mean life span is less than 120 hours. We
reject H0 and accept H1 since z = – 2.720