TESTING OF HYPOTHESES
3.2 LEARNING OBJECTIVES The student will be able to:
List the steps of hypothesis testing.
State in your own words the type I and type II errors for a given problem.
Extract the appropriate information from a story problem to perform a complete hypothesis test.
Set up the null and alternative hypotheses correctly.
Choose the appropriate test statistic.
Choose the appropriate level of significance.
NOTES
Find the critical value using a table and state the decision rule correctly. Make a statistical decision. State the conclusion.
Perform a hypothesis test for 2 means using the appropriate formula.
Choose when to use a 2-sample t-test vs. a 2-sample z-test.
List the assumptions for a 2-sample equal (pooled) variance independent test.
Perform a 2-sample equal (pooled) variance t-test
If the problem asks for a business decision based on the hypothesis test, state the appropriate decision.
Use an F-test to perform an equality of variance hypothesis test.
Incorporate the F-test for equality of variance in the hypothesis test for 2 means.
Interpret the results of the chi-square test of independence.
Look up the critical value in the chi-square table.
Generally population refers to a collection of entities such that each entity possesses an attribute called a characteristic. A statistical hypothesis, is a chain either about the value of a single population characteristics or about the values of several population characteristic.
Population
A statistical population is the set of all possible measurements on data corresponding to the entire collection of units for which an inference is to be made.
Parameter and statistic
You will be knowing how to find arithmetic mean, median, mode, standard deviation etc from the data contained in a sample. These are called some characterizations of a statistical distribution. These characteristics are called parameters if they are calculated for a population and are called statistics if they are calculated for a sample.
For example mean of a population is called a parameter and mean of a sample is called a statistic.
The values of the statistic will normally vary from one sample to another, as the values of the population members included in different samples, though drawn from the same population , may be different. These differences in the values of the statistic are said to be sampling fluctuations.
Sampling distribution.
These statistics vary from sample to sample if repeated random samples of the same size are drawn from a statistical population. The probability distribution of such a statistic is called the sampling distribution.
NOTES
Standard error (S.E)
If a random variable X is normally distributed with mean µ and standard deviation σ then the random variable X (the mean of a simple random sample of size n) is also normally distributed with mean µ and standard deviation
σ x = σ n
The standard deviation of the sampling distribution of mean referred to as the standard error of the mean and denoted by σ x = σ
n
For finite population standard error of the mean is given by σ x = σ (N – n)
n N – 1)
where N is the number of elements in the population and n is the number of elements in the sample.
Estimation and Testing of Hypothesis
In sampling theory, we primarily concerned with two types of problems which are given below:
a) Some characteristic or feature of the population in which we are interested may be completely unknown to us and we may like to make a guess about this characteristic entirely on the basis of a random sample drawn from the population. This type of problem is known as the problem of estimation.
b) Some information regarding the characteristic or feature of the population may be available to us and we may like to know whether the information is acceptable in the light of the random sample drawn from the population and if it can be accepted, with what degree of confidence it can be accepted.
This type of problem is known as the problem of testing of hypothesis.
Hypothesis testing addresses the important question of how to choose among alternative propositions while controlling and minimizing the risk of wrong decisions.
When we attempt to make decisions about the population on the basis of sample information , we have to make assumptions about the nature of the population involved or about the value of some parameter of the population. Such assumptions,
which may or may not be true, are called statistical hypothesis.
We set up a hypothesis which assumes that there is no significant difference between the sample statistic and the corresponding population parameter or between two sample statistics. Such a hypothesis of no difference is called a null hypothesis and is denoted by H.
NOTES
A hypothesis complementary to the null hypothesis is called an alternative hypothesis and is denoted by H1.A procedure for deciding whether to accept or to reject a null hypothesis and hence to reject or to accept the alternative hypothesis is called the test of hypothesis.
Test of significance
The difference between θ0 and θ where θ0 is a parameter of the population and θ is the corresponding sample statistic, which is caused due to sampling fluctuations is called insignificant difference.
The difference that arises due to the reason that either the sampling procedure is not purely random or that the sample has not been drawn from the given population is known as significant difference.
This procedure of testing whether the difference between θ0 and θ is significant or not is called as the test of significance.
Critical region
The critical region of a test of statistical hypothesis is that the region of the normal curve which corresponds to the rejection of null hypothesis.
Level of significance
Level of significance is the probability level below which the null hypothesis is rejected. Generally, 5% and 1% level of significance are used.
Errors in hypothesis
The level of significance is fixed by the investigator and as such it may be fixed at a higher level by his wrong judgment. Due to this, the region of rejection
becomes larger and the probability of rejecting a null hypothesis, when it is true, becomes greater. The error committed in rejecting H0, when it is really true, is called Type I error.
This is similar to a good product being rejected by the consumer and hence Type I error is also known as producer’s risk.
The error committed in accepting H0, when it is false, is called Type II error. As this error is similar to that of accepting a product of inferior quality, it is also known as consumer’s risk.
The probabilities of committing Type I and II errors are denoted by α & β respectively. It is to be noted that the probability of α of committing Type I error is the level of significance.
NOTES
One Tailed and two tailed tests
If θ0 is a parameter of the population and θ is the corresponding sample statistic and if we set up the null hypothesis H0: θ = θ0, then the alternative hypothesis which is complementary to H0 can be anyone of the following:
i) H1: θ θ0, i.e., θ > θ0 or θ < θ0 ii) H1: θ > θ0
iii) H1: θ < θ0
H1 given in (i) is called a two tailed alternative hypothesis, whereas H1 given in (ii) is called a right-tailed alternative hypothesis and H1 given in (iii) is called a left-tailed alternative hypothesis.
When H0 is tested while H1 is a one-tailedalternative (right or left), the test of hypothesis is called a one-tailed test.
When H0 is tested while H1 is a two-tailedalternative (right or left), the test of hypothesis is called a two-tailed test.
Critical values or significant values
The value of test statistic which separates the critical (or rejection) region and the acceptance region is called the critical value or significant value. It depends upon:
i)the level of significance used and
ii)the alternative hypothesis, whether it is two tails or single tailed.
The critical value of the test statistic a level of significance α for a two tailed test is given by zα where zα is determined by the equation
P(|Z| > zα) = α
i.e., zα is the value so that the total area of the critical region on both tails is α. Since normal probability curve is a symmetrical curve, we get
P(Z > zα) + P(Z < - zα) = α
P(Z > zα) + P(Z > zα) = α
2 P(Z > zα) = α
P(Z > zα) = α/2
i.e., the area of each tail is α/2 . Thus zα is the value such that area to the right of zα is α/2 and to the left - zα is α/2.
NOTES
TWO-TAILED TEST (Level of significance α)
In case of single tail alternative, the critical value zα is determined so that total area to the right of it is α and for left-tailed test the total area to the left is - zα is zα, i.e., For right tailed test : P(Z > zα) = α
For left tailed test : P(Z < - zα) = α
RIGHT-TAILED TEST (Level of significance α)
NOTES
LEFT-TAILED TEST (Level of significance α)
Thus the significant or critical value of Z for a single value of Z for a single-tailed test at level of significance α is same as the critical value of Z for a two tailed test at level of significance ‘2 α’
The critical values zα for some standard Level of significance’s are given in the following table.
Nature LOS 1%(.01) 2%(.02) 5%(.05) 10%(.1)
of test
Two-tailed | zα | = 2.58 | zα | = 2.33 | zα | = 1.96 | zα | = 1.645 Right-tailed zα = 2.33 zα =2. 055 zα = 1.645 zα = 1.28 Left-tailed z
α = -2.33 z
α = -2.055 z
α = -1.645 z
α = -1.28 Procedure for testing of hypothesis
1. Null Hypothesis H0 is defined.
2. Alternative hypothesis H1 is also defined after a careful study of the problem and also the nature of the test(whether one Tailed or two tailed tests ) is decided.
3. LOS(Level of significance ) ‘α’ is fixed or taken from the problem if specified and zα is noted.
4. The test-statistic z = X – E(X) is computed S.E (X)
5. Comparison is made between |z| > zα, H0 is rejected or H1 is accepted, i.e., it is concluded that the difference between x and E(x) is significant at α LOS.
NOTES
Confidence or Fiducial limits and Confidence intervalConfidence interval is an interval that provides lower and upper limits for a specific population parameter is expected to lie. The two values of the statistic which determine the limits of the interval are called confidence limits. Thus confidence interval is the interval in which a population parameter is expected to lie with certain probability.
For example 95% confidence interval for population mean µ is [ x - 1.96 σ , x + 1.96 σ ]
n n
3.3 TEST BASED ON NORMAL DISTRIBUTION