Steps in Hypothesis Testing - Basic and clinical Biostatistics 4th edition.pdf

A statistical hypothesis is a statement of belief about population parameters. Like the term “probability,” the term

“hypothesis” has a more precise meaning in statistics than in everyday use.

Step 1: State the research question in terms of statistical hypotheses. The null hypothesis , symbolized by H₀, is a statement claiming that there is no difference between the assumed or hypothesized value and the population mean; null means

“no difference.” The alternative hypothesis , which we symbolize by H₁ (some textbooks use H_A) is a statement that disagrees with the null hypothesis.

If the null hypothesis is rejected as a result of sample evidence, then the alternative hypothesis is concluded. If the evidence is insufficient to reject the null hypothesis, it is retained but not accepted per se. Scientists distinguish between not rejecting and accepting the null hypothesis; they argue that a better study may be designed in which the null hypothesis will be rejected.

Traditionally, we therefore do not accept the null hypothesis from current evidence; we merely state that it cannot be rejected.

For the Dennison and coworkers study, the null and alternative hypotheses are as follows:

H₀: The mean energy intake in 2 -year-old children in the study, µ₁, is not different from the norm (mean in NHANES III), µ₀, written µ₁ = µ₀.

H₁: The mean energy intake in 2 -year-old children in the Dennison and coworkers study, µ₁, is different from the norm (mean in NHANES III), µ₀, written µ₁ ≠ µ₀.

(Recall that µ stands for the true mean in the population.)

These hypotheses are for a two-tailed (or nondirectional) test: The null hypothesis will be rejected if mean energy intake is sufficiently greater than 1286 kcal or if it is sufficiently less than 1286 kcal. A two-tailed test is appropriate when investigators do not have an a priori expectation for the value in the sample; they want to know if the sample mean differs from the population mean in either direction.

A one-tailed (or directional) test can be used when investigators have an expectation about the sample value, and they want to test only whether it is larger or smaller than the mean in the population. Examples of an alternative hypothesis are H₁: The mean energy intake in 2-year-old children in the Dennison and coworkers study, µ₁, is larger than the norm (mean in NHANES III), µ₀, sometimes written µ₁ > µ₀ or

Energy (kcal) 1242 ±30^b 1286 ±22 1549±34 1573±28

Protein (g) 43 ±1.3 47 ±0.9 53±1.6 55±1.2

(%kcal) 14.0±0.2 14.7±0.2 13.7±0.2 14.1±0.21

Carbohydrate (g) 169±4.6 171±3.3 211±5.1 215±4.0

(% kcal) 54.4±0.6 53.9±0.6 54.7±0.6 55.3±0.5

Total fat (g) 46±1.3 49±1.1 57±1.6 58±1.4

(% kcal) 33.2±0.5 33.5±0.4 33.0±0.5 32.7±0.4

Saturated fat (g) 19±0.6 20±0.5 23±0.7 22±0.6

(% kcal) 13.7±0.3 13.8±0.2 13.2±0.3 12.5±0.2

Cholesterol 155±6.7 168±70 173±7.2 175±7.2

(mg/1000 kcal) 126±5.1 131^c 111±4.1 111^c

a Non-Hispanic White.

b Mean ± standard error of the mean.

c Calculated from NHANES III data; no standard deviation or standard error of the mean available.

Source: Reproduced from Table 2, with permission from the author and publisher, from Dennison BA, Rockwell HL, Baker SL: Excess fruit juice consumption by preschool-aged children is associated with short stature and obesity. Pediatrics 1997; 99:15–22.

P.105

H₁: The mean energy intake in 2 -year-old children in the Dennison and coworkers study, µ₁, is not larger than the norm (mean in NHANES III), µ₀, sometimes written as µ₁ ≤ µ₀.

Box 5-2. SIGN TEST OF CHANGE IN 7α-HYDROXY-4-CHOLESTEN -3-ONE (7α-HCO) BEFORE AND 1 MONTH AFTER CHOLECYSTECTOMY.

X1-X2<>0 5.3710 0.0000 Reject H₀ 5.3663 0.0000 Reject H₀

X1-X2<0 5.3710 1.0000 Accept H₀ 5.3757 1.0000 Accept H₀

X1-X2>0 5.3710 0.0000 Reject H₀ 5.3663 0.0000 Reject H₀

A one-tailed test has the advantage over a two-tailed test of obtaining statistical significance with a smaller departure from the hypothesized value, because there is interest in only one direction. Whenever a one-tailed test is used, it should therefore make sense that the investigators really were interested in a departure in only one direction before the data were examined. The disadvantage of a one -tailed test is that once investigators commit themselves to this approach, they are obligated to test only in the hypothesized direction. If, for some unexpected reason, the sample mean departs from the population mean in the opposite direction, the investigators cannot rightly claim the departure as significant. Medical researchers often need to be able to test for possible unexpected adverse effects as well as the anticipated positive effects, so they most frequently choose a two-tailed hypothesis even though they have an expectation about the direction of the departure. A graphic representation of a one-tailed and a two-tailed test is given in Figure 5-3.

Figure. No Caption available.

Source: Data, used with permission, from Sauter GH, Moussavian AC, Meyer G, Steitz HO, Parhofer KG, Jungst D: Bowel habits and bile acid malabsorptin in the months after cholecystectomy. Am J Gastroenterol 2002;97(2):1732–35. Table produced with NCSS; used with permission.

Step 2: Decide on the appropriate test statistic. Some texts use the term “critical ratio” to refer to test statistics. Choosing the right test statistic is a major topic in statistics, and subsequent chapters focus on which test statistics are appropriate for answering specific kinds of research questions.

We decide on the appropriate statistic as follows. Each test statistic has a probability distribution. In this example, the appropriate test statistic is based on the t distribution because we want to make inferences about a mean and do not know the population standard deviation. The t test is the test statistic for testing one mean; it is the difference between the sample mean and the hypothesized mean divided by the standard error.

Step 3: Select the level of significance for the statistical test. The level of significance, when chosen before the statistical test is performed, is called the alpha value , denoted by α (Greek letter alpha); it gives the probability of incorrectly rejecting the null hypothesis when it is actually true (and concluding there is a difference when there is not). This probability should be small, because we do not want to reject the null hypothesis when it is true. Traditional values used for α are 0.05, 0.01, and 0.001. We will use α = 0.05.

Step 4: Determine the value the test statistic must attain to be declared significant. This significant value is also called the critical value of the test statistic. Determining the critical value is simple (we already found it when we calculated a 95%

confidence interval), but detailing the reasoning behind the process is instructive. Each test statistic has a distribution; the distribution of the test statistic is divided into an area of (hypothesis) acceptance and an area of (hypothesis) rejection. The critical value is the dividing line between the areas.

An illustration should help clarify the idea. The test statistic in our example follows the t distribution; α is 0.05; and a two-tailed test was specified. Thus, the area of acceptance is the central 95% of the t distribution, and the areas of rejection are the 2.5%

areas in each tail (see Figure 5-3). From Table A–3, the value of t (with n – 1 or 94 – 1 = 93 df) that defines the central 95% area is between -1.99 and 1.99, as we found for the 95% confidence interval. Thus, the portion of the curve below -1.99 contains the lower 2.5% of the area of the t distribution with 93 df, and the portion above +1.99 contains the upper 2.5% of the area. The null hypothesis (that the mean energy intake of the group studied by Dennison and coworkers is equal to 1286 kcal as reported in the NHANES III study) will therefore be rejected if the critical value of the test statistic is less than -1.99 or if it is greater than +1.99.

In practice, however, almost everyone uses computers to do their statistical analyses. As a result, researchers do not usually look up the critical value before doing a statistical test. Although researchers need to decide beforehand the alpha level they will use to conclude significance, in practice they wait and see the more exact P value calculated by the computer program. We discuss the P value in the following sections.

Step 5: Perform the calculation. To summarize, the mean energy intake among the 94 two-year-old children studied by Dennison Figure 5-3. Defining areas of acceptance and rejection in hypothesis testing using α = 0.05. A: Two-tailed or nondirectional.

B: One-tailed or directional lower tail. C: One-tailed or directional upper tail. (Data, used with permission, from Dennison BA, Rockwell HL, Baker SL: Excess fruit juice consumption by preschool -aged children is associated with short stature and obesity.

Pediatrics 1997;99:15–22. Graphs produced using the Visualizing Continuous Distributions module in Visual Statistics, a program published by McGraw-Hill Companies; used with permission.)

P.107

and coworkers was 1242 kcal with standard deviation 256 and standard error 26.4.^a We compare this value with the assumed population value of 1286 kcal. Substituting these values in the test statistic yields

Step 6: Draw and state the conclusion. Stating the conclusion in words is important because, in our experience, people learning statistics sometimes focus on the mechanics of hypothesis testing but have difficulty applying the concepts. In our example, the observed value for t is -1.67. (Typically, the value of test statistics is reported to two decimal places.) Referring to Figure 5-3, we can see that -1.67 falls within the acceptance area of the distribution. The decision is therefore not to reject the null hypothesis that the mean energy intake in the 2-year-old children study by Dennison differs from that reported in the NHANES III study.

Another way to state the conclusion is that we do not reject the hypothesis that the sample of energy intake values could come from a population with mean energy intake of 1286 kcal. This means that, on average, the energy intake values

observed in 2-year-olds by Dennison are not statistically significantly different from those in the NHANES III. The probability of observing a mean energy intake of 1242 kcal in a random sample of 94 two-year-olds, if the true mean is actually 1286 kcal, is greater than 0.05, the alpha value chosen for the test.

Use the CD-ROM to confirm our calculations. Then use the t test with the data on 5 -year-old children, and compare the mean to 1573 kcal in the NHANES III study.

In document Basic and clinical Biostatistics 4th edition.pdf (Page 119-123)