• No results found

Chapter 14 (The Chi-Square Test)

N/A
N/A
Protected

Academic year: 2021

Share "Chapter 14 (The Chi-Square Test)"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

CHAPTER 14: HYPOTHESIS TESTING: CATEGORICAL DATA

This chapter describes two types of tests:

1. Tests of hypothesis about contingency tables called independence tests

2. Tests of hypothesis for experiments with more than two categories, called goodness of fit tests

All of these tests are performed by using the chisquare distribution. It is written as χ2 distribution, which is pronounced as ki

Like the t distribution the chisquare has only one parameter called the degrees of freedom (df) . The shape of a specific chisquare distribution depends on the number of degrees of freedom. The random variable χ2 assumes nonnegative values only. Henc4e a chisquare distribution curve starts at the origin and lies entirely to the right of the vertical axis. If we know the degrees of freedom and the area in the right tail of a chisquare distribution curve, we can find the value of χ2 from the table.

14.1 R× C CONTINGENCY TABLES.

Information can be summarized and presented using a two way classification table called a contingency table., which is also called a contingency table or cross tabulation

In a test of independence for a contingency table, we test the null characteristics of the elements of a given population are not related that they are independent) against the alternative hypothesis that the two characteristics are related ( that they are dependant). may want to test if there is an association between being a male or female and having a preference for watching sports or soap operas on television. We perform such a test by using the chi-square distribution

The Degrees of Freedom for a test of independence are df =(R−1)(C−1)

Where R and C are the number of rows and number of columns, respectively, in the given contingency table

14.1.1 A TEST OF INDEPENDENCE OF HOMOGENEITY

The value of the test statistic χ2for a test of independence is calculated as

( ) E E O 2 2 =Σ − χ

Where O and E are the observed and expected frequencies, respectively for a cell.

The null hypothesis in a test of independence is always that the two attributes are not related. The alternative hypothesis is that the two attributes are related. The frequencies obtained from the performance of an experiment for a contingency table are called the observed frequencies. The expected frequency E for a cell is calculated as

(2)

(

)(

)

sum l columntota rowtotal E =

14.2 CHI-SQUARE GOODNESS-OF-FIT TEST

This Section explains how to make tests of Hypothesis about experiments with more than two possible outcomes (categories). Such experiments called multinomial experiments possess four characteristics.

14.2.1 Observed and expected frequencies

The frequencies obtained from the actual performance of a test are called observed frequencies. In a goodness –of-fit test, we test the null hypothesis that the observed frequency for an experiment follows a certain pattern or theoretical distribution. It is called a goodness of fit test because the hypothesis tested is how good the observed frequencies fit a given pattern. O denotes them.

The expected frequencies, denoted by E are the frequencies that we will expect to obtain if the null hypothesis is true. The expected frequency for a category is obtained as

E =np, where n is the sample size and p is the probability that an element

belongs to that category if the null hypothesis is true

14.2.2 Degrees of freedom for a goodness of fit test

In a good ness of fit test, the degrees of freedom are df =k −1where k denotes the number of possible outcomes for the experiment

14.2.3 Test statistic for a goodness of fit test

The test statistic for a goodness of fit test is χ2and its value is calculated as

(

)

E E O 2 2 =Σ − χ

where O= Observed frequency for a category A Multinomial Experiment

An experiment with the following characteristics is called a multinomial experiment 1. It consists of n identical trials

2. Each trial results in one of k possible outcomes( categories ) where k>2 3. The trials are independent

(3)

E= expected frequency for a category

Remember that a chi-square goodness of fit test is always a right tailed test

Chi-squared goodness of Fit Test

PROBLEM SET Section 1

1. 300 employees of a company were selected at random and asked whether they were in favor of a scheme to introduce flexible working hours. The following table shows the opinions and the departments of the employees

Department Opinion

Infavour Uncertain Against

Production 89 42 9

Sales 53 36 11

Administration 38 12 10

Test whether there is evidence of a significant association between opinion and department ? ( 8.98)

2. A group of executives was classified according to total income and age. Test the hypothesis , that age is not related to the level of income

Age Less than $100,000 $100,000 to $ 399,999

$400,000 or more

Under 40 6 9 5

40 to 54 18 19 8

55 or older 11 12 17

Test whether or not type of industry is independent of state? (6.85)

3. Suppose a personnel department in investigated absentees, by categorizing absentees according to the shift on which they worked , as shown in the following table,.

Day of the Week Shift Monday Tuesday Wednesd

ay

Thursday Friday

Day 49 36 43 40 45

(4)

Is there is sufficient evidence at 5% significance level of an association between the days on which the employees are absent and the shift on which the employees work ? ( 0.3217)

4. A company owns Hyper mart in various parts of the country. The hyper marts are situated near large cities. Each Hyper mart has a a large car park that is free to use to users. The directors think that there are regional differences in the distances that customers travel to reach these stores. A hyper mart was selected in each of the three regions and a random sample of customers at each store was asked how far they have traveled to reach the store . The result were as follows

Distance Traveled Region

South Middle North

Less Than 5 Miles 50 80 70

Between 5 and 10

miles 80 60 20

More than 10 miles 70 60 10

Examine at 5% significance level whether there is any relation ship between distance traveled and region ? ( 57.85)

5. The marketing director for a metropolitan daily news paper is studying the relationship between the type of community the reader lives in and the portion of the paper he or she reads first. For a sample of readers the following information is obtained

National news Sports Comics

Urban 170 140 90

Rural 100 110 100

Farm 130 100 60

At the 0.05 significance level , can we conclude that there is a relation ship between the type of community where the person resides and the portion of the paper he reads first? ( 80.678)

6. The following data concerning industrial accidents and absentees are Classified according top the type of employee

Type of employee Absence following the

accident

Men Women Juvenile

Up to One Month 26 16 8

One month or Longer 14 9 7

Is there any evidence to suggest that the severity of accident is associated with the types of employee? Use a 5% significance level? (0.6618)

(5)

7 A tile company was interested in comparing the fraction of new house builders favoring three types of tiles as floor coverings for their houses in three different areas of Klang valley i.e. Subang jaya, Puchong and Petaling Jaya. A survey was conducted and the data were as follows

Are a

Floor

Covering Subang Jaya Puchong Petalling Jaya

Type1 224 165 36

Type 11 196 152 44

Type111 80 83 20

Test at 5%significance level whether there is any association between types of

tiles used and the areas concerned. 5.4)

8. A large consultancy firm regularly recruits MBA graduates. The personnel director has categorized each business school producing MBA graduates as top rate, adequate or bad to assist their recruitment strategy. A survey of the performance of 100 recent recruits has rated them as excellent, average or poor. A cross-classification of the results of the survey is shown in the table below.

Rating Of Graduates

Excellent Average Poor

Rating of Business Schools Top Rate 10 10 5 Adequate 7 30 8 Bad 3 20 7

Is there a relation ship between the rating of these recruits and the business school at which they were trained. ( Test at the 5% significance level) (9.44)

Section 2

1. A group of 385 mental patients has been classified according to parental social class, with the following results

Social

Class Upper Upper Middle Middle Lower middle Lower

Frequency 18 31 46 126 164

Test a 5% significance level that the data are consistent with the assumption that all social classes are equally likely to be represented (9.48)

2. Motor Vehicle production is the same each days. The following information is given below

Days Monday Tuesday Wednesday Thursday Friday

(6)

Test at 10% significance level to determine whether the number of vehicles is the same throughout the week? (5.36)

3. It has been estimated that employee absenteeism costs Malaysian companies more than RM 500 million per year. The personnel department of a large corporation recorded the weekdays during which individuals in a sample of 422 absentees were away over the past several months. .Do these data suggests that absenteeism is higher on some days of the week (use α=0.05) ( 4.091)

Day Monday Tuesday Wednesday Thursday Friday

Number absent 99 74 83 80 86

4. A company keeps detailed records of staff accidents. During a recent safety review , A random sample of 60 accidents was selected and classified by the day of the week on which they occurred

Day Monday Tuesday Wednesday Thursday Friday

Number of accidents 8 12 9 14 17

Test at 5% level of significance whether there is any evidence that accidents are more likely to happen on some days than others? (4.5)

5. A study reports an analysis of 35key product categories. At the time of the study, 72.9% of the products sold were of a national brand, 23 % were private –label and 4.1 % were generic. Suppose that you want to test whether these percentages are still valid for the market today. You collect a random sample of 1000 products in the 35 product categories studied, and you find the following: 610 products are of a national brand, 290 are private label, and 100 are generic. Conduct the test at the 0.025 level of significance. (119.98)

6. A farmers apples are graded on a scale from A to D before sale. Past experience shows that the percentages of apples in the four grades are as follows.

Grade A B C D

% 29 38 27 6

The farmer introduces a new treatment and applies it to a small number of trees to see if it affects the distribution of grades. The apples produced by these trees are graded as following

Grade A B C D

Number of apples 79 94 58 19

Test at the 5% level of significance to see if the new treatment has affected the distribution of grades? ( 3.08)

7. In a certain town in the Selangor state, the retailing market for petrol is shared among several companies. Their market share can be established in the ratio of 45: 25: 20: 10 respectively. A survey was conducted recently among 1000 car owners in that town and their preference were tabulated as follows

(7)

Oil Company Shell Esso Petronas Others Number of Car Owners 420 300 210 70

Use a χ2test at 1% significance level to test the hypothesis that there has been no

change in the market share for petrol? ( 21.5)

8. An organization recently published the number of acts of violence seen in types of television programs..

Type of program

Drama Old movies Cartoon Police Comedy News Acts Of

Violence

42 57 83 92 38 81

The organization claimed that such acts occur with equal frequency across all types of program. Test this claim at 10% level? ( 40.14)

9. Seattle Air craft Company Inc Manufactures and Sells Twin Otters in the U.S. Records of the company showed that sales, by regions , in the previous years were distributed according to the following proportions

Region West

Coast

North Central

North East South South East

Percentage (%) 30 25 20 10 15

This year, the numbers of planes sold in these regions are

Region West

Coast

North Central

North East South South East

No. Of Planes 330 220 170 120 160

Can you conclude at 1% significance level that the sales distribution for this year differ significantly from those of the previous years?( 15.1)

10. The LDP express way, which has five lines after the tollgate, was studied to see whether drivers preferred to drive on the inside lanes. A total of 1000 automobiles was observed during the early morning traffic, and the number of cars on respective lanes were recorded.. The result were as follows:

Lane 1 2 3 4 5

Observed count 96 154 275 225 171

Do the data provide sufficient evidence at 5%level of significance to indicate that some lanes are preferred over others ( 101.81)

(8)

11. A survey of the employees of a large company was conducted to see whether competence in computing skills was related to age. The results of the survey are given below

Age Group ( years ) Computing Skill Good Average Poor 18and under 30 70 20 10 30 and under 45 40 30 30 45 and over 30 30 60 (i) In a previous assessment of computing skills taken for all e3mployess 5 years

ago , it was found that 30% were good , 20% average and 50% poor. Combine the three age groups of the data in part (i) and test whether there is any evidence of a change in computing skill? ( 46.67)

(ii) Assuming that the survey was conducted by means of a random sample test , test the hypothesis that computing skill is associated with age? (55.713)

References

Related documents

If at any point of time you are not satisfied with the resolution or the services provided by the Global Helpdesk team and would like to talk about the same, you can follow the

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

The productivity practical management model evaluation of automotive mechanical skill program in state vocational school Semarang consist of: (1) productivity practical

(b) If a station permits a candidate to use its facilities, the station shall make all discount privileges offered to commercial advertisers, including the lowest unit charges for

After broadcast, the film Playing Model Soldiers enjoyed further distribution on the independent film festival circuit, with screenings at Liverpool’s Black Film Festival,

Kasselman, The Grid: blueprint for a future infrastructure, Morgan Kaufman, 1999.. • Starting from the analogy with electric

In this paper we present the architecture of MedioGRID, a system for real-time processing of satellite images, operating in a Grid environment, and the

The design of the HDR BrachyView probe, together with the short prostate to detector distance results in a system capable of obtaining the source position with sub-millimetre