rusan chen key words
2. Statistics Basics and the Logic of Hypothesis Testing Most universities offer statistics courses at various levels, especially in
2.3. introduction to hypothesis testing
Hypothesis testing is a decision-making process to determine whether the findings from a sample can be generalized to a population. Hypothesis testing is the central theme for the statistical procedures introduced in this chapter.
The logic of hypothesis testing is based on the distribution of statistics. Imag-ine that we draw samples again and again from the same population and cal-culate the statistics from each sample. The distribution of all of the statistics relates the characteristics of the sample to those of the population. In a very simplified version of the process, hypothesis testing involves three steps:
Step 1: State a null hypothesis concerning the population under investigation.
Step 2: Find out the probability that the null hypothesis is true.
Step 3: If the probability is low, say, less than 5% (or 1%), the null hy-pothesis is rejected. Otherwise, the null hyhy-pothesis is retained.
In the first step of hypothesis testing a null hypothesis is stated. The null hypothesis may not be the same as the research interest. Researchers usually make statements for their research interests according to the results they ex-pect to find. For example, research interests are often stated positively: Stu-dents of English as a second language (ESL) who have received training using the keyword method will perform better on a vocabulary definition test than students who have not received such training. Null hypotheses are always statements that imply null expectations, such as no difference for the compari-sons or no relationships among the variables in the population. For the re-search interest mentioned above, the null hypothesis would be that there is no difference in scores on vocabulary definition tests for students with and with-out keyword training in the population. Please note that null hypotheses al-ways concern the population, not the sample.
How do we find out the probability that the null hypothesis is true in the second step of hypothesis testing? The probability can be calculated from known theoretical functions. Imagine that we draw random samples (with the same sample size) again and again from a population where the null hypothesis
is true and calculate the statistics from each sample. The statistics will be dis-tributed as the normal distribution, a known function from which we are able to obtain the probability that the null hypothesis is true. Numerical calcu-lations for obtaining the probability from a theoretical distribution can be cumbersome, but you can find the probability from tables in most statistics textbooks or from a statistical package such as SPSS. The probability that a null hypothesis is true is often called the p value of the test. In this chapter, methods for obtaining the p value from a table and from SPSS output will be illustrated.
The third step in hypothesis testing is a decision-making step. Assume that the p value you obtained from a particular study is less than 0.05. Now con-sider the following fact: If the null hypothesis were true in the population and you had enough resources to repeat the same study 100 times using different samples (with the same sample size) from the population, only five or fewer studies would be likely to have a p value less than 0.05. Since you already have a sample with a p value less than 0.05, the null hypothesis is not likely to be true in the population. Therefore if a p value is less than 0.05, we make a deci-sion that the null hypothesis is rejected. In most cases, the rejection of a null hypothesis confirms the research interest.
When the p value is greater than 0.05, the null hypothesis is retained. Re-taining a null hypothesis indicates that based on the information from the available data, the null hypothesis cannot be declared as false. The case is in-conclusive because we cannot make a decision about the null hypothesis based on the information we have.
There are several commonly used terms related to the hypothesis-testing procedure that warrant discussion. A type I error is made if you reject a true null hypothesis. For example, assume that the CSR method used in class-rooms with LEP students in general does not improve performance compared with the traditional instructional method. If you draw a conclusion that CSR is significantly more effective than the traditional method, you make a type I error. The probability of making a type I error is denoted asa (alpha), which is conventionally set at the 0.05 or 0.01 level. These are the levels at which the researcher is willing to take the risk of making a type I error. You make a type II error when you retain a false null hypothesis. For example, if the CSR method is truly more effective in general but you conclude that there is no dif-ference, you make a type II error. The probability of making a type II error is calledb (beta).
Let’s use a hypothetical study to illustrate the three steps of hypothesis test-ing. Assume that the average total scaled score for the TOEFL test for students admitted to U.S. graduate programs from the year 2000 to 2001 is 620, with a standard deviation of 50.1A TOEFL training school in China claims that
anyone who takes its three-month program is guaranteed to meet the TOEFL scores required for U.S. graduate schools. A student who attended the training school took the TOEFL and reported that her total scaled score was 560. In this hypothetical study, the research interest is to evaluate the school’s asser-tion that anyone who attends the training program will meet the TOEFL re-quirements of graduate programs in the U.S. Although the sample size in this study is very small (only one student), we can still conduct legitimate hypoth-esis testing.
The first step in hypothesis testing is to state a null hypothesis. In our hy-pothetical study, the null hypothesis is that students who complete the three-month TOEFL training program are not different from the students admitted into U.S. graduate programs. Please note that null hypotheses always make statements concerning populations, not samples. We have two populations in this study: all students who were admitted into U.S. graduate programs in 2000 to 2001 and all students who completed the three-month TOEFL train-ing program. The null hypothesis states that the two populations are not dif-ferent in TOEFL scaled scores.
The second step in hypothesis testing is to find out the probability that the null hypothesis is true. We have only one subject available who had a total scaled score of 560. We also know that the mean for TOEFL scaled scores for U.S. graduate programs is 620 with a standard deviation of 50. Based on this information, we can calculate a z score (also called standardized score) for the student. Using the formula z⫽ (l ⫺ X)兾r, where l is the population mean andr the population standard deviation, the z score for this student is (620 ⫺ 560)兾50 ⫽ 1.20. From a z table that most statistics textbooks provide in the appendix, we find that a z score of 1.20 has a corresponding p value of 0.115.
The interpretation of p equaling 0.115 is that if the null hypothesis is true, the probability of obtaining a score of 560 or less after completing the training program is 0.115.
With the p value known, we are ready to proceed to the third step of hy-pothesis testing, to make a decision about the null hyhy-pothesis. Since the p value is greater than 0.05, we are not able to reject the null hypothesis. The null hypothesis is then retained. As discussed previously, a p value greater than 0.05 is inconclusive, indicating that we do not have enough information to conclude that the two populations are different in their TOEFL scores. Had the p value been less than 0.05, we would have rejected the null hypothesis with a conclusion that students who completed the three-month training pro-gram were different from the students admitted to graduate propro-grams in the United States. Because the score 560 is lower than the population mean of 620, we may further conclude that students who completed the program are
significantly lower in TOEFL scores than the students admitted to the gradu-ate program in the United Stgradu-ates.
In the following sections, we will introduce the statistical procedures most commonly used by SLA researchers, including t tests, F tests, and chi-square tests. Although the tests are applied to different research topics, the logic of hypothesis testing is the same for all tests: State a null hypothesis first, find out the p value, and then make a decision based on the p value to reject or retain the null hypothesis.