LEARNING OBJECTIVES The student will be able to:

TESTING OF HYPOTHESES

3.2 LEARNING OBJECTIVES The student will be able to:

 List the steps of hypothesis testing.

 State in your own words the type I and type II errors for a given problem.

 Extract the appropriate information from a story problem to perform a complete hypothesis test.

 Set up the null and alternative hypotheses correctly.

 Choose the appropriate test statistic.

 Choose the appropriate level of significance.

NOTES

^ Find the critical value using a table and state the decision rule correctly. Make a statistical decision.

 State the conclusion.

 Perform a hypothesis test for 2 means using the appropriate formula.

 Choose when to use a 2-sample t-test vs. a 2-sample z-test.

 List the assumptions for a 2-sample equal (pooled) variance independent test.

 Perform a 2-sample equal (pooled) variance t-test

 If the problem asks for a business decision based on the hypothesis test, state the appropriate decision.

 Use an F-test to perform an equality of variance hypothesis test.

 Incorporate the F-test for equality of variance in the hypothesis test for 2 means.

 Interpret the results of the chi-square test of independence.

 Look up the critical value in the chi-square table.

Generally population refers to a collection of entities such that each entity possesses an attribute called a characteristic. A statistical hypothesis, is a chain either about the value of a single population characteristics or about the values of several population characteristic.

 Population

A statistical population is the set of all possible measurements on data corresponding to the entire collection of units for which an inference is to be made.

 Parameter and statistic

You will be knowing how to find arithmetic mean, median, mode, standard deviation etc from the data contained in a sample. These are called some characterizations of a statistical distribution. These characteristics are called parameters if they are calculated for a population and are called statistics if they are calculated for a sample.

For example mean of a population is called a parameter and mean of a sample is called a statistic.

The values of the statistic will normally vary from one sample to another, as the values of the population members included in different samples, though drawn from the same population , may be different. These differences in the values of the statistic are said to be sampling fluctuations.

 Sampling distribution.

These statistics vary from sample to sample if repeated random samples of the same size are drawn from a statistical population. The probability distribution of such a statistic is called the sampling distribution.

NOTES

 Standard error (S.E)

If a random variable X is normally distributed with mean µ and standard deviation σ then the random variable X (the mean of a simple random sample of size n) is also normally distributed with mean µ and standard deviation

σ _x = σ n

The standard deviation of the sampling distribution of mean referred to as the standard error of the mean and denoted by σ _x = σ

n

For finite population standard error of the mean is given by σ _x = σ (N – n)

n N – 1)

where N is the number of elements in the population and n is the number of elements in the sample.

 Estimation and Testing of Hypothesis

In sampling theory, we primarily concerned with two types of problems which are given below:

a) Some characteristic or feature of the population in which we are interested may be completely unknown to us and we may like to make a guess about this characteristic entirely on the basis of a random sample drawn from the population. This type of problem is known as the problem of estimation.

b) Some information regarding the characteristic or feature of the population may be available to us and we may like to know whether the information is acceptable in the light of the random sample drawn from the population and if it can be accepted, with what degree of confidence it can be accepted.

This type of problem is known as the problem of testing of hypothesis.

Hypothesis testing addresses the important question of how to choose among alternative propositions while controlling and minimizing the risk of wrong decisions.

When we attempt to make decisions about the population on the basis of sample information , we have to make assumptions about the nature of the population involved or about the value of some parameter of the population. Such assumptions,

which may or may not be true, are called statistical hypothesis.

We set up a hypothesis which assumes that there is no significant difference between the sample statistic and the corresponding population parameter or between two sample statistics. Such a hypothesis of no difference is called a null hypothesis and is denoted by H.

NOTES

A hypothesis complementary to the null hypothesis is called an alternative hypothesis and is denoted by H₁.

A procedure for deciding whether to accept or to reject a null hypothesis and hence to reject or to accept the alternative hypothesis is called the test of hypothesis.

 Test of significance

The difference between θ₀and θwhere θ₀ is a parameter of the population and θ is the corresponding sample statistic, which is caused due to sampling fluctuations is called insignificant difference.

The difference that arises due to the reason that either the sampling procedure is not purely random or that the sample has not been drawn from the given population is known as significant difference.

This procedure of testing whether the difference between θ₀and θis significant or not is called as the test of significance.

 Critical region

The critical region of a test of statistical hypothesis is that the region of the normal curve which corresponds to the rejection of null hypothesis.

 Level of significance

Level of significance is the probability level below which the null hypothesis is rejected. Generally, 5% and 1% level of significance are used.

 Errors in hypothesis

The level of significance is fixed by the investigator and as such it may be fixed at a higher level by his wrong judgment. Due to this, the region of rejection

becomes larger and the probability of rejecting a null hypothesis, when it is true, becomes greater. The error committed in rejecting H₀, when it is really true, is called Type I error.

This is similar to a good product being rejected by the consumer and hence Type I error is also known as producer’s risk.

The error committed in accepting H₀, when it is false, is called Type II error. As this error is similar to that of accepting a product of inferior quality, it is also known as consumer’s risk.

The probabilities of committing Type I and II errors are denoted by α & β respectively. It is to be noted that the probability of α of committing Type I error is the level of significance.

NOTES

 One Tailed and two tailed tests

If θ₀ is a parameter of the population and θ is the corresponding sample statistic and if we set up the null hypothesis H₀: θ = θ₀, then the alternative hypothesis which is complementary to H₀ can be anyone of the following:

i) H₁: θ  θ₀, i.e., θ > θ₀ or θ < θ₀ ii) H₁: θ > θ₀

iii) H₁: θ < θ₀

H₁ given in (i) is called a two tailed alternative hypothesis, whereas H₁ given in (ii) is called a right-tailed alternative hypothesis and H₁ given in (iii) is called a left-tailed alternative hypothesis.

When H₀ is tested while H₁ is a one-tailedalternative (right or left), the test of hypothesis is called a one-tailed test.

When H₀ is tested while H₁ is a two-tailedalternative (right or left), the test of hypothesis is called a two-tailed test.

 Critical values or significant values

The value of test statistic which separates the critical (or rejection) region and the acceptance region is called the critical value or significant value. It depends upon:

i)the level of significance used and

ii)the alternative hypothesis, whether it is two tails or single tailed.

The critical value of the test statistic a level of significance α for a two tailed test is given by z_α where z_α is determined by the equation

P(|Z| > z_α) = α

i.e., z_α is the value so that the total area of the critical region on both tails is α. Since normal probability curve is a symmetrical curve, we get

P(Z > z_α) + P(Z < - z_α) = α

 P(Z > z_α) + P(Z > z_α) = α

 2 P(Z > z_α) = α

 P(Z > z_α) = α/2

 i.e., the area of each tail is α/2 . Thus z_α is the value such that area to the right of z_{α is}α/2 and to the left - z_α is α/2.

NOTES

TWO-TAILED TEST (Level of significance α)

In case of single tail alternative, the critical value z_α is determined so that total area to the right of it is α and for left-tailed test the total area to the left is - z_αis z_α, i.e., For right tailed test : P(Z > z_α) = α

For left tailed test : P(Z < - z_α) = α

RIGHT-TAILED TEST (Level of significance α)

NOTES

LEFT-TAILED TEST (Level of significance α)

Thus the significant or critical value of Z for a single value of Z for a single-tailed test at level of significance α is same as the critical value of Z for a two tailed test at level of significance ‘2 α’

The critical values z_α for some standard Level of significance’s are given in the following table.

Nature LOS 1%(.01) 2%(.02) 5%(.05) 10%(.1)

of test

Two-tailed | z_α | = 2.58 | z_α | = 2.33 | z_α | = 1.96 | z_α | = 1.645 Right-tailed zα = 2.33 z_α=2. 055 z_α= 1.645 z_α= 1.28 Left-tailed z

α = -2.33 z

α = -2.055 z

α = -1.645 z

α = -1.28 Procedure for testing of hypothesis

1. Null Hypothesis H₀ is defined.

2. Alternative hypothesis H₁ is also defined after a careful study of the problem and also the nature of the test(whether one Tailed or two tailed tests ) is decided.

3. LOS(Level of significance ) ‘α’ is fixed or taken from the problem if specified and z_αis noted.

4. The test-statistic z = X – E(X) is computed S.E (X)

5. Comparison is made between |z| > z_α, H₀ is rejected or H₁ is accepted, i.e., it is concluded that the difference between x and E(x) is significant at α LOS.

NOTES

 Confidence or Fiducial limits and Confidence interval

Confidence interval is an interval that provides lower and upper limits for a specific population parameter is expected to lie. The two values of the statistic which determine the limits of the interval are called confidence limits. Thus confidence interval is the interval in which a population parameter is expected to lie with certain probability.

For example 95% confidence interval for population mean µ is [ x - 1.96 σ , x + 1.96 σ ]

n n

3.3 TEST BASED ON NORMAL DISTRIBUTION

In document DIT 111 Probability and queing theory.pdf (Page 171-178)