Chapter 4. ANOVA for Education. 2020.doc

(1)

Chapter IV

ANOVA

.

... …...

Objetive

Chapter

The purpose is to take a decision maker

to compare three or more independent

sample means to see is there are

statistically significant differences

between the means of the populations

from which the sample are taken.

Many analyses involve experiments in

which you want to test if one or more

discrete level factors (independent

variable) influence an outcome

measurement (variable quantitative, and

is dependent variable).

(2)

4.1 Introduction

Analysis of variance (ANOVA) is a test of hypothesis that is appropriate to compare means of a continuous variable in two or more independent comparison groups. ANOVA was developed by Ronald Fisher in 1918 and is the extension of the T and the Z test. Before the use of ANOVA, the T-test and Z-test were commonly used. But the problem with the T-test is that it cannot be applied for more than two groups.

Analysis of variance provides a way to determine if one or more discrete level factors (independent variable) influence an outcome measurement (variable quantitative, and is dependent variable).

Factor: a characteristic under consideration, thought to influence the measured observations.

The purpose is to take a decision maker to compare three or more independent sample means to see is there are statistically significant differences between the means of the populations from which the sample are taken.

The null hypothesis, typically, is that all means are equal.

Analysis of variance must have a dependent variable that is metric (measured using an interval or ratio scale).

There must also be one or more independent variables that are all categorical (nonmetric). Categorical independent variables are also called factors.

A particular combination of factor levels, or categories, is called a treatment.

• Analysis of variance compares two or more populations of interval data.

• Specifically, we are interested in determining whether differences exist between the population means.

One-way analysis of variance involves only one categorical variable, or a single factor. In one-way analysis of variance, a treatment is the same as a factor level. When we compare more than two groups, based on one factor (independent variable), this is called one way ANOVA.

Application Example

We wish to conduct a study in the area of mathematics education involving different teaching methods to improve standardized math scores in local classrooms. The study will include four different teaching methods and use fifth grade students who are randomly sampled from a large urban school district and are then random assigned to the four different teaching methods.

The four different teaching methods to be examined are:

1) the traditional teaching method where the classroom teacher explains the concepts and assigns homework problems from the textbook.

2) the intensive practice method, in which students fill out additional work sheets both before and after school.

3) the computer assisted method, in which students learn math concepts and skills from using various computer based math learning programs.

4) the peer assistance learning method, which pairs each fifth grader with a sixth grader who helps them learn the concepts followed by the student teaching the same material to another student in their group.

Students will stay in their math learning groups for an entire academic year. At the end of the spring semester all students will take the Multiple Math Proficiency Inventory (MMPI).

The experiment is designed so that each of the four groups will have the same sample size.

One-way repeated measures ANOVA

A one-way repeated measures ANOVA is used when you have a single group on which you have measured something more than one time. For example, if you wanted to test students’ understanding of a subject, you could administer the same test at the beginning of the course, in the middle of the course, and at the end of the course. You would then use a one-way repeated measures ANOVA to see if students’ performance on the test changed over time.

Two-way between groups ANOVA

(3)

Each of the main effects is a one-way test. The interaction effect is simply asking if there is any significant difference in performance when you test the final grade and overseas/local acting together.

4.2 Conducting One-Way Analysis of Variance

1. Identification of Dependent & Independent (Factor) Variables (the dependent variable can be continuous or on the interval scale and a Factor variable in ANOVA should be categorical).

Variables which are experimentally manipulated by an investigator are called independent variables. 2. Decomposition of the Total Variation

3. Measurement of Effects

4. Significance Testing (ANOVA results: Contrasts, Multiple Comparisons, Tests for Trend) 5. Assumptions in Analysis of Variance

6. Interpretation of Results

• Variance can be separated into two major components

Between groups - differences according to the group or the treatment received. Within groups – variability or differences in particular groups (individual differences)

Relationship amongst T Test, Analysis of Variance, Analysis of Covariance, & Regression

4.3 Assumptions must be true before the ANOVA technique can be applied to a decision-making situation: (these assumptions can be tested using statistical software).

 The samples are drawn randomly, and each sample is independent of other samples.

 The errors are normally distributed, with a zero mean and a constant variance.

 The variances of all errors are equal to each other. The assumption of homogeneity of variance can be tested using tests such as Levene’s test or the Brown-Forsythe Test

• If the Sig. that gives Levene’s test is (p >.05), assume equal variances

• If the Sig. that gives Levene’s test is (p <.05), equal variances cannot be assumed (this information makes adjustments to the violation of equal variances).

(4)

In case the crucial assumptions of ANOVA are no met, ONE WAY wish to consider a parallel Non-parametric test such as

• Kruskal – Wallis procedure or

• Friedman procedure, respectively, for One or two-way ANOVA

It is important to note that ANOVA is not robust to violations to the assumption of independence. This is to say, that even if you violate the assumptions of homogeneity or normality, you can conduct statistical procedures that will still enable you to conduct the ANOVA but you cannot with violations to independence. In general, with violations of homogeneity the study can probably carry on if you have equal sized groups.

4.4 The procedural Steps for an ANOVA Test

Step 1. State the Hypotheses

In general one-way ANOVA techniques can be used to study the effect of k(>2) levels of a single factor. To determine if different levels of the factor affect measured observations differently, the following hypotheses are tested.

H0: μ1 = μ2 = … = μK (That is, “all population means are equal”), where K= the number of group of population under study

Ha: At least one µi is different (at least one mean differs from the others)

Step 2. Select the Level of Significance

A criterion for rejection of Ho is necessary, and test are typically made where α is specified to be .01, .05 or .10 Step 3. Determine to the Test Distribution to use

An F distribution is used in an ANOVA test.

The null hypothesis may be tested by the F statistic based on the ratio between these two estimates:

The ANOVA F-statistic is a ratio of the Between Group Variation divided by the Within Group Variation:

A large F is evidence against H0, since it indicates that there is more difference between groups than within groups.

This statistic follows the F distribution, with (k- 1) and (N - k) degrees of freedom (df) K= Number of groups

Step 4. Making a Decision and Interpreting the Result of the Test

• If the null hypothesis of equal category means is not rejected, then the factor or independent variable does not have a significant effect on the dependent variable.

• On the other hand, if the null hypothesis is rejected, then the effect of the independent variable is significant.

(Since the test statistic F is more or exceeds the critical value F, we can reject Ho) Decision rule with Sig. or p_value:

• If the Sig. is less than the level of significance (α = .05) we should reject the null hypothesis.

(5)

Operational formulas

(1) Total sum of squares

(2) Sum of Squares Between Groups

(3) Sum of Squares Within Groups

Illustrative Applications of One-way ANOVA

Are sexually active teenagers better informed about AIDS and other potential health problem related to sex than teenagers who are sexually inactive?

A 15-item test of general knowledge about sex and health was administered to random samples of teens who are sexually inactive, teens who are sexually active but with only a single partner, and teens who are sexually active with more than one partner. Is there any significant difference in the test scores?

The data is

:

Sexually

Inactive Active-Onepartner Active-More thanone partner

14 11 8

12 11 12

8 6 10

14 5 4

11 12 3

12 10 5

We illustrate the previous example, analysis of variance procedure using the software SPSS, the results of conducting are presented as follow:

Steps for Illustrative Applications of One-way

Null and alternative hypothesis

Ho: That is, “the group means are all equal”

Ha: At least one µi is different “at least one mean differs from the others”

Level of significance:

Grand Total (add all of the scores together, then square the total)

Square each individual score and then add up all of the squared scores

Total number of subjects

(6)

Procedure in SPSS

Output from SPSS - Descriptives

Descriptives

General knowledge about sex and health (AIDS) in teenagers

N Mean

Std.

Deviation Std. Error

95% Confidence Interval for Mean

Minimu m

Maximu m Lower

Bound

Upper Bound

Inactive 6 11.833 2.229 0.910 9.495 14.172 8.00 14.00 Active-One

partner 6 9.167 2.927 1.195 6.095 12.238 5.00 12.00

Active-More than one

partner 6 7.000 3.578 1.461 3.245 10.755 3.00 12.00

Total 18 9.333 3.447 0.812 7.619 11.048 3.00 14.00

This table describes the means and standard deviations of each group: Score in general knowledge about sex and health (AIDS) in teenagers

(7)

ANOVA

Sum of

Squares df Mean Square F Sig. Between Groups _70.333 ₂ _35.167 _4.006 _.040

Within Groups _131.667 ₁₅ _8.778

Total 202.000 17

Statistic Test:

Making a Decision and Interpreting the Result of the Test

The significance (Sig.) value of the F test in the ANOVA table is .040 < .05. Thus, you must reject the null hypothesis that average scores about knowledge about sex and health in teenagers are not equal across the groups. Now that you know the groups differ in some way, you need to learn more about the structure of the differences.

The means plot helps you to "see" this structure. Teenagers who are sexually inactive have a higher score than their counterparts.

Which means are different?

Can directly compare the subgroups using “Post Hoc” tests.

4.6 Post Hoc Tests: Multiple Comparisons

Post-hoc tests allow you to determine where significant differences lie.

When the ANOVA is found to be significant, one must examine which two groups differ significantly from the total number of groups: so post-hoc tests look at mean differences between different pairs:

Post-hoc testing usually involves multiple comparisons.

There are several multiple comparison tests that can be conducted that will control the type one error rate.

• If you are concerned about violations of the assumptions use Scheffe’s Test.

• If you are not concerned about violations to the assumptions and are testing compound and pair wise tests, use Dunn’s test or the modified Bonferroni Test.

• If you are not concerned with violations of the assumptions and are just comparing the treatment to the control, use Dunnette’s Test.

(8)

• Games-Howell does not assume population variances are equal or that sample sizes are equal, so is a good alternative if this turns out to be the case.

All of these tests will ensure that the Type I error rate remains under control as was established by the researcher and will tell you exactly which groups are different from one another.

P.D: When the null hypothesis is rejected, the conclusion is that at least one population mean is different from at least one other mean. However, since the ANOVA does not reveal which means are different from which, it offers less specific information than the Post-hoc Analysis, one of them is Tukey HSD. The Tukey HSD is therefore preferable to ANOVA in this situation. Some textbooks introduce the Tukey test only as a follow-up to an ANOVA. However, there is no logical or statistical reason why you should not use the Tukey test even if you do not compute an ANOVA.

You might be wondering why you should learn about ANOVA when the Tukey test is better. One reason is that there are complex types of analyses that can be done with ANOVA and not with the Tukey test. A second is that ANOVA is by far the most commonly-used technique for comparing means, and it is important to understand ANOVA in order to understand research reports.

Post Hoc output Multiple comparison

Dependent Variable: Tukey HSD

(I) Group

Mean Difference

(I-J) Std. Error Sig.

95% Confidence Interval Lower Bound Upper Bound Inactive Active-One

partner 2.667 1.711 0.293 -1.776 7.110 Active-More

than one

partner 4.833

* _1.711 _0.032 _0.390 _9.276

Active-One partner

Inactive -2.667 1.711 0.293 -7.110 1.776 Active-More

than one partner

2.167 1.711 0.435 -2.276 6.610

Active-More than one partner

Inactive 4.833* _1.711 _0.032 _-9.276 _-0.390 Active-One

partner -2.167 1.711 0.435 -6.610 2.276 *. The mean difference is significant at the 0.05 level.

Interpretation: Looking at the data, the researcher asks:

•Are the two groups between the inactive and active-more than one partner really different? (Sig. = 0.032, therefore the difference between the two groups are statistically significant).

Homogeneous subset

General knowledge about sex and health (AIDS) in teenagers

Tukey HSDa

Group N

Subset for alpha = 0.05

1 2

partner 6 7.0000

Active-One partner 6 9.1667 9.1667

Inactive 6 11.8333

Sig. .435 .293

Means for groups in homogeneous subsets are displayed. a. Uses Harmonic Mean Sample Size = 6.000.

According to the table before, teenagers who are inactive sexually have more highly score than their counterparts.

Required Conditions or assumption:

1. The populations tested are normally distributed.

(9)

Assumption: Normality

Each group is approximately normal, check this by looking at histograms and boxplot or normal Q-Q plots, or use the test Kolmogorov Smirnov if the sample is big, or use test Shapiro-Wilk if the sample is small (< 30).

Tests of Normality

Group

Kolmogorov-Smirnova _Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Inactive .196 6 .200* _.890 ₆ _.316

Active-One

partner .279 6 .159 .838 6 .126

partner .212 6 .200

* _.935 ₆ _.619

Ho: The errors terms follow a normal distribution Ha: The errors terms do not follow normal distribution

Decision and interpret result: P-values or (sig)>.05, then do not reject Ho, therefore conclude that the errors follow a normal distribution or the normality assumption may be assumed valid.

Steps to find Test of Normality

Assumption: homogeneity of variance

This table (Levene’s test) tests the assumption of equal variances for the ANOVA.

Test of Homogeneity of Variances

(The population variances are equal)

Ha: The population variances are not equal

Test of Homogeneity of Variances

Levene Statistic df1 df2 Sig.

1.750 2 15 .207

Making Decision and interpret the result: look at the sig. or p-value (.207) which is above .05. The Sig. or p_value given in the last column is sufficiently large to conclude the assumption of constant variances should not be rejected, therefore we conclude that the population variances are equal.

(10)

Notes for interpret Sig for main hypothesis:

- Sig. or P_value: when interpreting the Sig or p-value for a test, if the value is less than .05 then the test is significant at the 5% of level of significant, and we would usually say there is evidence to reject the null hypothesis.

If the Sig. or p-value is less than 0.1 but greater than 0.05 then there is weak evidence in favor of the alternative hypothesis. Finally if the p-value is greater than 0.1 then we would usually say there is no evidence to reject the null hypothesis. Never accept the null hypothesis and conclude it to be true as this will be incorrect; we always reject or do not reject the null.

4.7 Multivariate Analysis of Variance

• Multivariate analysis of variance (MANOVA) is similar to analysis of variance (ANOVA), except that instead of one metric dependent variable, we have two or more, based on their relationships to categorical and scale predictors.

• In MANOVA, the null hypothesis is that the vectors of means on multiple dependent variables are equal across groups.

• Multivariate analysis of variance is appropriate when there are two or more dependent variables that are correlated.

Review problems of chapter

Follow the procedures covered in this chapter to generate appropriate to answer the following questions: 1. What necessary assumption must be met for an analysis of variance test to be valid?

2. In a One-way ANOVA, if the Sig or p_value is greater than the level of significance, you: a. Reject Ho because there is evidence all the means differ

b. Reject Ho because there is evidence at least one of the means differs from the others c. Do not reject Ho because there is no evidence of a difference in the means

d. Do not reject Ho because one mean is different from the others 3. In a one-way ANOVA, the null hypothesis is always:

a. All the population means are different b. Some of the population means are different c. Some of the population means the same d. All of the population means are the same

The following should be used to answer Question 4 through 7

4. Three groups of students involved in the same curriculum are obliged to study for 15 minutes, 30 minutes, or 45 minutes a night for 8 weeks before taking a mathematics test. Their scores are as follows:

15 minutes 43 39 55 56 73

30 minutes 55 58 66 79 62

45 minutes 51 66 85 86 89

a. Are the differences statistically significant? (Conduct ANOVA test). Answer=F=3.354

b. If F is significant, which group(s) is (are) significantly different from which? (See Multiple comparison) c. Which group is better?

(11)

e. What is the independent or factor variable? and what is the data scale of the independent variable? f. What is the dependent variable? and what is the data scale of the dependent variable?

5. A random sample of 15 nations from three levels of development has been selected. “Least developed” nations are largely agricultural and have two lowest quality of life. “Developed” nations are industrial and the most affluent and modern. “Developing” nations are between these extremes. Are these general characteristics reflected in differences in life expectancy (the number of years the average citizen can expect to live at birth) between the three categories?

The

data for 15 nations:

Least developed Developing Developed

Nation

Life

expectancy Nation

Life

expectancy Nation

Life expectancy

Cambodia 56.8 China 71.6 Australia 79.9

Mali 47 Indonesia 68.3 Belgium 78

Nepal 58.2 Pakistan 61.5 Japan 80.8

Niger 41.6 South Korea 74.7 Russia 67.3

Sudan 56.9 Turkey 71.2 United Kingdom 77.8

Source: U.S. Bureau of the census 2003. Statistica Abstract of the United States, 2002. P. 829. Washington, D.C.: U.S. Government Printing Office.

Are there statistically significant differences in life expectancy between nations at different levels of economic development? Answer: F=22.048, sig =.000

6. What type of person is most involved in the neighborhood and the community? Who is more likely to volunteer for an organization such as Umuganda, Scouts or Little League? Random samples of 15 people were asked for the number of times they participated during 6 months in the voluntary community organization. What differences are significant?

What is the p_value for those tests? Interpret it and tell what decision you would make from it.

Number of times they participated by education:

Number of times they participated by length of residence in present

community:

Number of times they participated by number of

children: Less than

High

School schoolHigh College

Less than 2

years 5 years More than5 years None Child One

More Than One

Child

0 1 0 0 0 1 0 2 0

1 3 3 1 2 3 1 3 3

2 3 4 3 3 3 1 4 4

3 4 4 4 4 4 3 4 4

4 5 4 4 5 4 3 4 5

Answers: Membership by education: F=.805, sig .470

Membership by length of residence in present community: F=.165, sig .850 Membership by number of children: F= 2.32 , sig .141