• No results found

Hypothesis Testing

As with estimation and confidence limits, the purpose of a hypothesis test is to permit generalizations from a sample to the population from which it came. Both statistical hypothesis testing and estimation make certain assumptions about the population and then use probabilities to estimate the likelihood of the results obtained in the sample, given the assumptions about the population. Again, both assume a random sample has been properly selected.

Statistical hypothesis testing involves stating a null hypothesis and an alternative hypothesis and then doing a statistical test to see which hypothesis should be concluded. Generally the goal is to disprove the null hypothesis and accept the alternative. Like the term “probability,” the term “hypothesis” has a more precise meaning in statistics than in everyday use, as we will see in the following chapters.

The next several chapters will help clarify the ideas presented in this chapter, because we shall reiterate the concepts and illustrate the process of estimation and hypothesis testing using a variety of published studies. Although these concepts are difficult to understand, they become easier with practice.

SUMMARY

This chapter focused on several concepts that explain why the results of one study involving a certain set of subjects can be used to draw conclusions about other similar subjects. These concepts include probability, sampling, probability distributions, and sampling distributions. We began with examples to illustrate how the rules for calculating probabilities can help us determine the distribution of characteristics in samples of people (eg, the distribution of blood types in men and women; the distribution of heart rate variation).

The addition rule, multiplication rule, and modifications of these rules for nonmutually exclusive and nonindependent events were also illustrated. The addition rule is used to add the probabilities of two or more mutually exclusive events. If the events are not mutually exclusive, the probability of their joint occurrence must be subtracted from the sum. The multiplication rule is used to multiply the probabilities of two or more independent events. If the events are not independent, they are said to be conditional;

Bayes' theorem is used to obtain the probability of conditional events. Application of the multiplication rule allowed us to conclude that gender and blood type are independently distributed in humans. The site of infection, however, was not independent from the time during an epidemic at which an individual contracted serogroup B meningococcal disease.

The advantages and disadvantages of different methods of random sampling were illustrated for a study involving the measurement of tracheal diameters. A simple random sample was obtained by randomly selecting radiographs corresponding to random numbers taken from a random number table. Systematic sampling was illustrated by selecting each 17th x-ray film. We noted that systematic sampling is easy to use and is appropriate as long as there is no cyclical component to the data. Radiographs from different age groups were used to illustrate stratified random sampling. Stratified sampling is the most efficient method and is therefore used in many large studies. In clinical trials, investigators must randomly assign patients to experimental and control conditions (rather than randomly select patients) so that biases threatening the validity of the study conclusions are minimized.

Three important probability distributions were presented: binomial, Poisson, and normal (gaussian). The binomial distribution is used to model events that have a binary outcome (ie, either the outcome occurs or it does not) and to determine the probability of outcomes of interest. We used the binomial distribution to obtain the probabilities that a specified number of men with localized prostate tumor survive at least 5 years.

The Poisson distribution is used to determine probabilities for rare events. In the CASS study of coronary artery disease, hospitalization of patients during the 10-year follow -up period was relatively rare. We calculated the probability of hospitalization for patients randomly assigned to medical treatment. Exercise 5 asks for calculations for similar probabilities for the surgical group.

The normal distribution is used to determine the probability of characteristics measured on a continuous numerical scale. When the distribution of the characteristics is approximately bell-shaped, the normal distribution can be used to show how representative or extreme an observation is. We used the normal distribution to determine percentages of the population expected to have systolic BPs above and below certain levels. We also found the level of systolic BP that divides the population of normal, healthy adults into the lower 95% and the upper 5%.

We emphasized the importance of the normal distribution in making inferences to other samples and discussed the sampling distribution of the mean. If we know the sampling distribution of the mean, we can observe and measure only one random sample, draw conclusions from that sample, and generalize the conclusions to what would happen if we had observed many similar samples.

Relying on sampling theory saves time and effort and allows research to proceed.

We presented the central limit theorem, which says that the distribution of the mean follows a normal distribution, regardless of the shape of the parent population, as long as the sample sizes are large enough. Generally, a sample of 30 observations or more is large enough. We used the values of heart rate variation from a study by Gelber and colleagues (1997) and values of BP from the Society of Actuaries (1980) to illustrate use of the normal distribution as the sampling distribution of the mean.

P.90

Estimation and hypothesis testing are two methods for making inferences about a value in a population of subjects by using observations from a random sample of subjects. In subsequent chapters, we illustrate both confidence intervals and hypothesis tests. We also demonstrate the consistency of conclusions drawn regardless of the approach used.

EXERCISES

1.

a. Show that gender and blood type are independent; that is, that the joint probability is the product of the two marginal probabilities for each cell in Table 4-2.

b. What happens if you use the multiplication rule with conditional probability when two events are independent? Use the gender and

blood type data for males, type O, to illustrate this point.

2. The term “aplastic anemia ” refers to a severe pancytopenia (anemia, neutropenia, thrombocytopenia) resulting from an acellular or markedly hypocellular bone marrow. Patients with severe disease have a high risk of dying from bleeding or infections. Allogeneic bone marrow transplantation is probably the treatment of choice for patients under 40 years of age with severe disease who have a human leukocyte antigen (HLA)-matched donor.

Researchers reported results of bone marrow transplantation into 50 patients with severe aplastic anemia who did not receive a transfusion of blood products until just before the marrow transplantation (Anasetti et al, 1986). The probability of 10 -year survival in this group of nontransfused patients was 82%; the survival rate was 43–50% for patients studied earlier who had received multiple transfusions. Table 4 -12 gives the incidence of acute graft-versus-host disease, chronic graft-versus-host disease, and death in subgroups of patients defined according to serum titers of antibodies to cytomegalovirus from this study.

Use the table to answer the following questions.

a. What is the probability of chronic graft-versus-host disease?

b. What is the probability of acute graft-versus-host disease?

c. If a patient seroconverts, what is the probability that the patient has acute graft-versus-host disease?

d. How likely is it that a patient who died was seropositive?

e. What proportion of patients was seronegative? If this value were the actual proportion in the population, how likely would it be for 4 of 8 new patients to be seronegative?

3. Refer to Table 4 -1 on the 150 patients in the pre-epidemic time period for the development of serogroup B meningococcal disease. Assume a patient is selected at random from the patients in this study.

a. What is the probability a patient selected at random had sepsis as the only site of infection?

b. What is the probability a patient selected at random had sepsis as one of the sites of infection?

c. If race and sex are independent, how many of the white patients can be expected to be male?

4. A plastic surgeon wants to compare the number of successful skin grafts in her series of burn patients with the number in other burn patients. A literature survey indicates that approximately 30% of the grafts become infected but that 80% survive.

She has had 7 of 8 skin grafts survive in her series of patients and has had one infection.

a. How likely is only 1 out of 8 infections?

b. How likely is survival in 7 of 8 grafts?

5. Use the Poisson distribution to estimate the probability that a surgical patient in the CASS study would have five

hospitalizations in the 10 years of follow -up reported by Rogers and coworkers (1990). (Recall that the 390 surgical patients had a total of 1487 hospitalizations.) Compare this estimate to that for patients treated medically.

P.91

Table 4 -12. Incidence of graft-versus-host disease.

Condition Sero-Negativea Sero-Convertersb Sero-Positivec

Acute graft-versus-host disease 6/17 2/18 2/12

Chronic graft-versus-host disease 7/14 3/18 2/10

Death 3/7 3/18 2/12

a Patients who had titers of less than 1:8 before transplant and never showed consistent titer increases. One patient received marrow from a cytomegalovirus seropositive donor and 16 patients, from seronegative donors.

b Initially seronegative patients who became seropositive within 100 days after transplant. Six patients received marrow from cytomegalovirus seropositive donors and 10 from cytomegalovirus seronegative donors. Serum titers in 2 donors were not determined for antibodies to cytomegalovirus.

c Patients with titers of more than 1:8 before transplant. Within this group, seven patients had fourfold increases in serum titers of antibodies to cytomegalovirus and one other patient showed cultures of virus within 3 months of transplantation. Two of the eight patients developed acute versus-host disease, one had chronic graft-versus-host disease, and one died.

Source: Adapted and reproduced, with permission, from Anasetti C, Doney KC, Storb R, Meyers JD, Farewell VT, Buckner CD, et al: Marrow transplantation for severe aplastic anemia. Ann Intern Med 1986;104:461–466.

6. The values of serum sodium in healthy adults approximately follow a normal distribution with a mean of 141 mEq/L and a standard deviation of 3 mEq/L.

a. What is the probability that a normal healthy adult will have a serum sodium value above 147 mEq/L?

b. What is the probability that a normal healthy adult will have a serum sodium value below 130 mEq/L?

c. What is the probability that a normal healthy adult will have a serum sodium value between 132 and 150 mEq/L?

d. What serum sodium level is necessary to put someone in the top 1% of the distribution?

e. What serum sodium level is necessary to put someone in the bottom 10% of the distribution?

7. Calculate the binomial distribution for each set of parameters: n = 6, π = 0.1; n = 6, π = 0.3; n = 6, π = 0.5. Draw a graph of each distribution, and state your conclusions about the shapes.

8.

a. Calculate the mean and the standard deviation of the number of months since a patient's last office visit from Table 4-10.

b. Calculate the mean and the standard deviation of the sampling distribution of the mean number of months from Table 4-11. Verify that the standard deviation in the sampling distribution of means (SE) is equal to the standard deviation in the population (found in part A) divided by the square root of the sample size, 2.

9. Assume that serum chloride has a mean of 100 mEq/L and a standard deviation of 3 in normal healthy populations.

a. What proportion of the population has serum chloride levels greater than 103 and less than 97 mEq/L?

b. If repeated samples of 36 were selected, what proportion of them would have means less than 99 and greater than 101 mEq/L?

10. The relationship between alcohol consumption and psoriasis is unclear. Some studies have suggested that psoriasis is more common among people who are heavy alcohol drinkers, but this opinion is not universally accepted. To clarify the nature of the association between alcohol intake and psoriasis, Poikolainen and colleagues (1990) undertook a case –control study of patients between the ages of 19 and 50 who were seen in outpatient clinics. Cases were men who had psoriasis, and controls were men who had other skin diseases. Subjects completed questionnaires assessing their life styles and alcohol consumption for the 12 months before the onset of disease and for the 12 months immediately before the study. Use the information in Table 4 -13 on the frequency of intoxication among patients with psoriasis.

a. What is the probability a patient selected at random from the group of 131 will be intoxicated more than twice a week, assuming the standard deviation is the actual population value σ? Hint: Remember to convert the standard error to the standard deviation.

b. How many times a year would a patient need to be intoxicated in order to be in the top 5% of all patients?

11. The Association of American Medical Colleges reported that the debt in 2002 for graduates from U.S. medical schools was:

mean $104,000 and median $100,000; 5% of the graduates had a debt of $200,000 or higher. Assuming debt is normally distributed, what is the approximate value of the standard deviation?

Footnote

aThe probability of three or more events that are not mutually exclusive or not independent involves complex calculations beyond the scope of this book. Interested readers can consult any introductory book on probability.

Editors: Dawson, Beth; Trapp, Robert G.

P.92

Table 4-13. Alcohol intake (g/day) and frequency of intoxication (times/year) before onset of skin disease among patients with psoriasis and controls.

Mean SEM Number of Cases P valuea

Alcohol intake:

Patients with psoriasis 42.9 7.2 142 0.004

Controls 21.0 2.1 265

Frequency of intoxication

Patients with psoriasis 61.6 6.2 131 0.007

Controls 42.6 3.3 247

a Two sided t-test; separate variance estimate.

Source: Reproduced with permission from Table III in Poikolainen K, Reunala T, Karvonen J, Lauharanta J, Karkkaimen P: Alcohol intake: A risk factor for psoriasis in young and middle-aged men? Br Med J 1990;300: 780–783.

Title: Basic & Clinical Biostatistics, 4th Edition observed estimate, such as the mean, is different from a norm: the size of the difference, the degree of variability, and the sample size.

Power is the complement of a type II, or β, error: it is concluding there is a difference when one does exist.

Power depends on several factors, including the sample size. It is truly a key concept in statistics because it is critical that researchers have a large enough sample to detect a difference if one exists.

The t distribution is similar to the z distribution, especially as sample sizes exceed 30, and t is generally used in medicine when asking questions about

The P value first assumes that the null hypothesis is true and then indicates the probability of obtaining a result as or more extreme than the one observed. In more straightforward language, the P value is the probability that the observed result occurred by chance alone.

The logic behind statistical hypothesis tests is somewhat backwards, generally assuming there is no difference and hoping to show that a difference exists.

The z distribution, sometimes called the z approximation to the binomial, is used to form confidence intervals and test hypotheses about a proportion.

Several assumptions are required to use the t distribution for confidence intervals or hypothesis tests.

The width of confidence intervals (CI) depends on the confidence value. 99% CI are wider than 95% CI because 99% CI provide greater confidence.

Tests of hypothesis are another way to approach statistical inference; a somewhat rigid approach with six steps is recommended.

Paired, or before-and-after, studies are very useful for detecting changes that might otherwise be obscured by variation within subjects, because each subject is his or her own control.

Confidence intervals and statistical tests lead to the same conclusions, but confidence intervals actually provide more information and are being increasingly recommended as the best way to present results.

Paired studies are analyzed by evaluating the differences themselves. For numerical variables, the paired t test is appropriate.

In hypothesis testing, we err if we conclude there is a difference when none exists (type I, or α, error), as well as when we conclude there is not difference when one does exists (type II, or β, error).

The kappa κ statistic is used to compare the agreement between two independent judges or methods when observations are being categorized.

The McNemar test is the counterpart to the paired t test when observations are nominal instead of numerical.

To estimate the needed sample size for a study, we need to specify the level of significance (often 0.05), the desired level of power (generally 80%), the size of the difference in order to be of clinical importance, and an estimate of the standard deviation.

The sign test can be used to test medians (instead of means) if the