Chapter III
Hypothesis testing
…...
..…...
Objetive
Chapter
Developing the methodology of
hypothesis testing to analyze
differences and make decisions,
determine the risks involved in
making such decisions if we rely
solely on information from the
probability sample, and
the
INFERENTIAL STATISTICS Hypothesis
A statistical hypothesis is a statement on a probabilistic model and a hypothesis test is a method to determine the possibility of that statement based on a sample.
The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter.
The critical concepts are these:
1. There are two hypotheses: the null and the alternative hypotheses.
2. The procedure begins with the assumption that the null hypothesis is true.
3. The goal is to determine whether there is enough evidence to infer that the alternative hypothesis is true, or the null is not likely to be true.
4. There are two possible decisions:
Reject the null: To conclude that there is enough evidence to infer that the alternative hypothesis is true. Fail to reject the null: To conclude that there is insufficient evidence to support the alternative hypothesis.
Application
The statistic shows the average life expectancy in Africa for those born in 2018, by gender and region. The average life expectancy across the whole continent was 61 years for males and 64 years for females. The average life expectancy globally was 70 years for males and 74 years for females in 2018. (World Population Review, 2018)
Life Expectancy at birth in years
Source: United Nation Population Division
2
Country 1960 2015 2018 Trend
Source: Human Development Report 2016 *Human Development Index (HDI)
Example 1
The world population study - BUREAU reported that the average life expectancy in years in Africa is 62.5 years (2018). In recent years, Rwanda is growing respecting life expectancy. We would like to test there is significant difference between the average of the whole African continent with Rwanda.
Years 1990 1995 2000 2005 2010 2011 2012 2013 2014 2015 2018
Life
Expectancy-Rwanda 33.4 31.5 48.1 54.9 61.7 62.4 63.1 63.6 64.2 64.7 67.13
Statement of Hypothesis
Procedure
Solution
One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
Life Expectancy-Rwanda 11 55.8845 12.73935 3.84106
The report shows the descriptive statistics, the average life expectancy between 1990 and 2018 is 55.88 years and the variability around mean is 12.74 and the coefficient of variation is 22.80, which means that the distribution is heterogeneous over the years.
One-Sample Test
Test Value = 62.5
t df Sig. (2-tailed) Mean Difference 95% Confidence Interval of the Difference
Life Expectancy-Rwanda -1.722 10 .116 -6.61545 -15.1739 1.9430
Decision rule: Sig = 0.116 >0.05, therefore at level of significance of 5% we fail to reject the null hypothesis, i.e. there is no significant difference between the average of the whole African continent with Rwanda.
Interpreting P_value or Sig.
The smaller the p-value or Sig., the more statistical evidence exists to support the alternative hypothesis. • If the Sig. is less than 1%, there is overwhelming evidence that supports the alternative
hypothesis.
• If the Sig. is between 1% and 5%, there is a strong evidence that supports the alternative hypothesis.
• If the Sig. is between 5% and 10% there is a weak evidence that supports the alternative hypothesis.
• If the Sig. exceeds 10%, there is no evidence that supports the alternative hypothesis. T-test: when experiments include only 2 groups
Used for Testing for Mean Differences a. Independent
b. Paired or Correlated i. Within-subjects
ii. Matched
The t-Test for Comparing Two Means (independent)
Assumption when we use t test (parametric test)
Remember that for proper use of the distribution "t" or normal distribution "Z", the data must satisfy the following assumptions:
Assume that the random samples are independent
Level of measurement of dependent variable is interval-ratio
Randomness: samples were selected using a probabilistic method. Otherwise inference is not applied.
Normality: The variables of analysis, in both populations are normally distributed. (Check with graphs: Boxplot, histogram with normal curve, Normal Q-Q plot, or test with: Shapiro-Wilk, KS, etc.). If not satisfy these conditions do using a nonparametric test or you transform your variable.
Homogeneity of variances: The population variances are not different. That is: (Levene test, F, etc.). If not corrected the number of degrees of freedom and used the t test cuff applies a nonparametric test. When samples are very unequal are more likely to violate this assumption.
1. Formulate the appropriate hypotheses
2. Statistic: independent sample t test
3. Making Decision and Interpreting the Result of the Test
Take your decision according the following rules: If the result of Sig .05 you reject null hypothesis or If Sig is more than 5%, do not reject null hypothesis or retain null hypothesis.
Example 2
A researcher is interested in the effect of an approach to teaching graduate statistics on statistics anxiety. The statistics course offered by the Educational Psychology department is a lecture based course and a computer based course with no lectures. The content of both courses is exactly the same. There are twelve students in each class. At the end of the course students were asked to fill out the Statistics Anxiety Questionnaire. The results are presented below:
Score about statistics anxiety Lecture Based
Approach 10 23 11 17 7 4 18 11 11 14 10 19
Computer Based
Approach 27 24 15 19 17 21 26 17 20 29 27 22
SPSS Steps
Enter this data into SPSS. (Tip: To do this, you will have to enter two rows of data: one for the class (the first 12 rows will have an indicator 1 to indicate lecture and the second 12 rows will have an indicator 2 to indicate computer) and one column for the respective anxiety scores). See the figure below:
Test the null hypothesis that the difference between the mean anxiety score of the students taking the lecture based course and the mean anxiety score of the students taking the computer based course is zero.
To do this, click on Graphs: Do the scores in both populations
appear to be normally distributed?
2. Conduct a t-test for two independent samples:
Step in SPSS: Analyze < Compare Means < Independent Samples t test < Transfer your dependent variable (anxiety) to Test Variable(s) and the independent variable (teaching approach) to the Grouping Variable bar < Define the groups <Type the numerical values for the two groups (1= Lecture, 2
= Computer) < Continue <OK
3. Examine your output and answer the following questions:
a) What are the mean anxiety scores for the two groups? _____________, _____________
b) If the assumption of homogeneity of variance met? For Levene's test for equality of variances, if the test is nonsignificant, do not reject the hypothesis that the two population variances are equal.
c) What is the mean difference for the two samples? d) What is the value of the t test?
e) How many degrees of freedom are there? f) What is the obtained p value or Sig.?
g) Using the 0.05 level of significance, do you reject or retain the null hypothesis? Output
Group Statistics
Teaching_Approach N Mean Std. Deviation Std. Error Mean
Anxiety Lecture Based Approach 12 12.9167 5.43488 1.56891
Computer Based Approach 12 22.0000 4.59248 1.32574
Making a Decision and Interpreting the Result of the Test
So we request the t test for independent samples, which is t = - 4.422, looking at the next Sig (2 tails) the value is .000, less than .05 (Sig <0.05), therefore, with a significance level of 5% we can say that the results indicate that there is a significant difference in the average anxiety score in both groups of the classes, that is, the classes taught with the computer approach produce more anxiety than the Lecture approach in a significant way.
b) Is the assumption of homogeneity of variance met? Assumption of Homogeneity
Through the Levene test you can see if this assumption is met, it is very important to compare groups. Ho: equal variances assumed
Ha: equal variances not assumed
Decision: The p-value or Sig provides the Levene test is greater than 5% (Sig = .596), and then we cannot reject Ho and conclude that equal variances assumed.
Testing a Hypothesis about Two Related Means
The paired-samples t test is appropriate whenever two related sample means are to be compared. The difference scores are assumed to follow a reasonably normal distribution, especially with respect to skewness. Before running the t test, you can assess the distribution of difference scores by examining the histogram of a computed difference variable. Test variables with extreme or outlying values should be carefully checked; boxplots can be used for this.
One of the most common experimental designs is the "pre-post" or “Before and After” design. A study of this type often consists of two measurements taken on the same subject, one before and one after the introduction of a treatment or a stimulus. The basic idea is simple. If the treatment had no effect, the average difference between the measurements is equal to 0 and the null hypothesis holds. On the other hand, if the treatment did have an effect (intended or unintended!), the average difference is not 0 and the null hypothesis is rejected.
For example, if we give training teachers and we want to know whether or not the training had any impact on the efficiency of the teachers, we could use the paired sample t-test. We collect data from the teachers on a five scale rating, before the training and after the training. By using the paired sample t-test, we can statistically conclude whether or not training has improved the efficiency of the teachers.
Formulate the appropriate null and alternative hypothesis
Example 3
10 teachers were subjected to a program of intensive training by a specialist. Their score were recorded before and after the training with the following results:
Teacher 1 2 3 4 5 6 7 8 9 10
Score before (x) 127 195 162 170 143 205 168 175 197 136
Does it affect the program the average score of teachers? Hypothesis
Procedure
Output:
Paired Samples Statistics
Mean N
Std. Deviation
Std. Error Mean
Pair 1 Before 167.8 10 26.57819 8.40476
After 172.7 10 23.38589 7.39527
Interpretation: the report shows the descriptive statistics, the average score (before) is less than the average after implementing the training, but do not know whether this difference observed is significant.
Paired Samples Test Paired Differences
t df
Sig. (2-tailed) Mean
Std. Deviation
Std. Error Mean Pair 1 Before -
After -4.9 6.74043 2.13151 -2.299 9 0.047
So we ask the t test for Paired-Samples t test, which gives t = -2.299. Looking at the next Sig. (2-tailed) the value is .047, lower than proposed. As our hypothesis is unilateral divide by 2 and get the significance for the problem under study, that is, 0.047/2 = 0.0235. Therefore the p-value for this problem is 0.0235. Decision rule: Sig <0.05, we reject Ho, therefore at level of significance of 5% we can say that the training was effective with respect to increment the average score of teachers.
Review problems of chapter
Follow the procedures covered in this chapter to generate appropriate to answer the following questions:
1. What is the purpose of a statistical hypothesis?
2. What is a significant level? How does a researcher choose a significance level?
3. A social work department wishing to validate an empathy scale gives the measure to intending Social Workers and to a group of student matched by age and sex whose career choices are other than Social Work. The following table shows the scores of the two groups on the empathy measure.
Do the scores indicate that intending Social Workers have higher empathy scores than non-Social Workers?
Empathy scores of Social Work and non-Social Work students
Social workers 80 79 78 69 68 78 75 74 73 81
Non-social workers 68 71 58 62 52 67 63 70 59 61
Answer t = 5.243, Sig = .000
4. What type of person is most involved in the neighborhood and community (Umuganda community)? Who is more likely to volunteer for organization such as scouts, or Little League? A random sample of 10 people has been asked for their number of memberships in community voluntary organization and some other information. Which differences are significant?
Membership by education:
Membership by length of residence in present community: Less than
High School
High school
Less than 2 years
More than 2 years
0 1 0 1
1 3 1 3
2 3 3 3
3 4 4 4
4 5 4 4
Membership by education:
t = - 1.238, Sig .251; assumption ANOVA (Sig=.781)
Membership by length of residence in present community:
t = - .612, Sig .557, ; assumption ANOVA (Sig=.165)
5. Two groups of students involved in the same curriculum are obliged to study for 30 minutes, or 60 minutes a night for 8 weeks before taking a mathematics test. Their scores are as follows:
30 minutes: 55, 58, 66, 79, 82 60 minutes: 61, 66, 85, 86, 91
a. Are the differences statistically significant? Answer: t= -1.370, sig .201
b. What is the independent variable? and what is the data scale of the independent variable? c. What is the dependent variable? and what is the data scale of the dependent variable?
6. A group of ten patients who were newly detected diabetes was observed to determine whether an educational program was effective in increasing their knowledge of diabetes. A test was applied before and after the educational program on self-related aspects of the disease. The test results were as follows:
Patient 1 2 3 4 5 6 7 8 9 10
Before 75 62 67 70 55 59 60 64 72 59
After 77 65 68 72 62 61 60 67 75 68
Does the educational program was effective with respect to patients’ knowledge? Answer. t = - 3,692, Sig = .005
7. In a public school, 9 pairs of second-graders were chosen to compare similarity of intelligence and readiness. Each child was taught with a method I and the following year with method II. After the learning period, the children were tested with the following results (the score was 0 to 100):
Child 1 2 3 4 5 6 7 8 9
Method II 63 68 68 60 68 66 60 78 70
At a significance level of 5%, is there a significant difference in the efficacy of any of the methods used? Answer. T = .398, Sig. = .701