Chapter Summary - Chapter 2. Comparing Two Proportions: Randomization Methods

Spreadsheets are an integral part of every statistical package. They display

characteristics measured or observed on individuals. Each row represents an individual;

thus the number of rows is the number of individuals in the data set. Each column represents a variable; thus the number of columns is the number of variables in the data set. We will use Fathom and PASW as statistical packages in this course.

Individuals, sometimes called subjects, are the people or objects we are interested in.

When performing experiments, we often call them subjects. Variables are characteristics measured or observed on individuals.

Categorical variables place individuals into categories. Quantitative variables are able to have arithmetic operations performed on them. In other words you can add, subtract, multiply and divide the values and the answers make sense.

Often when exploring relationships between two variables you can identify both an explanatory variable and a response variable. The explanatory variable is the variable that we think is explaining the change in the response variable.

We looked at how to use scrambling and Fathom to compare percentages from two different groups. The null hypothesis is that the percentages are the same in both groups, and the alternative hypothesis is that the percentages are different in the two groups. If we assume the null hypothesis is true, then we can scramble the explanatory variable many, many times and, each time, find the difference in the percentages of the two groups. We can then see how unlikely the difference in our percentages from our original sample is relative to the difference seen from scrambling (null hypothesis is true).

The way the sample is gathered and the type of study performed changes the types of conclusions that are possible. In experiments, researchers randomly assign subjects to treatment groups, which eliminates potential effects due to lurking variables and, thus, cause and effect conclusions are possible. In observational studies, lurking variables could be explaining the observed relationship between variables and thus cause-effect conclusions are not possible. Furthermore, when a sample is randomly selected from the population using a probabilistic technique (e.g. simple random sampling) the sample is typically representative of the population. When a sample is representative of the population, conclusions about the sample can be made to the population. There is no such assurance with convenience sampling.

Exercises

1. Shown below is a dataset on Hope students. A “picture” of the spreadsheet of the data is shown on the next page and immediately below is a list of all variables in the survey. Questions about this dataset follow on the next page.

Variables:

Gender (0=female, 1=male)

Class (1=freshman, 2=sophomore, 3=junior, 4=senior) Hometown (name of hometown)

Extra curricular (hours in last 4 weeks) Religious view change (0=no, 1=yes) Chapel satisfaction (0=no, 1=yes)

Chapel attendance (times attended in the last month)

West Michigan (Ottawa, Muskegon, Allegan or Kent counties; 0=no, 1=yes)

Collection 1

gende r class hom etow n extracirricular religious_view s _change chapel_s atisfaction chapel_attendance w est_MI <new >

a) How many variables are in this data set?

b) How many individuals are in this data set?

c) Which variables are categorical?

d) Which variables are quantitative?

2. Below is a cross-tabulation table from a survey of 314 Hope students in 2008 that compares their class-standing to their “religiosity” which is their answer to the question “How religious are you?”

Not at all religious

Not very religious

Somewhat religious

Quite religious

Very religious

Total

Fresh 4 15 22 31 13 85

Soph 3 8 26 24 15 76

Junior 4 5 28 22 14 73

Senior 4 10 28 31 7 80

Total 15 38 104 108 49 314

Based on the table above answer the following questions:

c) What percent of this sample of Hope students say they are not very religious?

d) What percent of this sample of Hope students are juniors?

e) What percent of juniors say that they are not very religious? What percent of seniors say they are not very religious?

Study Design

3. Imagine if you were to take a survey of all of the students in your section of Math 210 with a goal to learn about all Hope College students. This would be a convenience sample. Explain whether you think your section of Math 210 would be representative of all Hope College students for the following questions on the survey. Justify your answers.

a) What is your blood-type?

b) What is your class standing? (Fr, So, Jr, Sr) c) What state do you live in?

d) What is your major (proposed major)?

e) What is your favorite pizza place?

f) What is your favorite NFL team?

g) What kind of toothpaste do you use?

4. In question #2 you looked at when your section of Math 210 students might be representative of all Hope College students. At times it will be representative and at times it won’t be.

a) What method is the best method to ensure you have a representative sample of Hope College students?

b) If you do use a convenience sample, why do you need it to be representative of the population in order to draw conclusions to the population?

5. Explain, in detail, how you would take a simple random sample of all Hope College students.

6. A stratified random sample is used in place of a simple random sample in some cases. A stratified random sample takes simple random samples of two (or more) groups in a population. For example, you could first take a simple random sample of 100 white students at Hope and then a simple random sample of 100 non-white students at Hope. Your overall sample size would be 200 students. White students would be one stratum and non-white students the other stratum.

a) How would the number of white and non-white students in a simple random sample differ from the stratified random sample explained above?

b) Based on your answer to (a), why might a stratified random sample be a good idea in some situations?

c) Explain the following statement: “Stratified random samples are not

representative of the entire population. Each stratum in the sample, however, is representative of the corresponding strata in the population.”

d) From the stratified random sample of 200 students (100 white and 100 non-white), someone concludes “In this sample of 200 students, there are 40 black students, thus 40/200=20% of Hope students are black.” Name the problem with this statement. How does it illustrate a problem/drawback of stratified random samples?

e) The sampling frame is the list of all individuals in the population and is needed in order to take simple or stratified random sample. In addition to a person’s name and contact information, in order to take a stratified random sample you MUST have one other piece of information, what is it? Why must you have it?

7. Sometimes the sampling frame (see 5e) is unavailable. For example, in national political polls we would need to have a list of all likely voters in the United States. It is nearly impossible to create such a list and, as soon as it was created, it would be outdated (immigration, death, marriage, moving, etc. etc.). In cases where the sampling frame is unavailable, cluster sampling is used. Cluster sampling involves breaking the population into groups and then taking a sample of the groups. This can be done multiple times (multi-stage cluster sample). Identify each of the following studies as a census, simple random sample, stratified random sample, cluster sample, or convenience sample:

a) For their Math 210 project, a student hands out surveys to students in the Pine Grove.

b) During an NFL football game, viewers are asked to log on to their computer and vote for their favorite player.

c) To estimate how many people in the United States think that the current President is “doing a good job” (Presidential Approval Rating), 50 counties across the United States are randomly selected from a list of all counties. Within each county, 1 voting district is randomly selected. A list of all members of a voting district is compiled and all members are contacted to participate in the survey.

d) A list of all residents with a land-line phone in the city of Holland is obtained from the phone book. 500 residents are randomly selected to participate in a survey on impact of the economy on day-to-day life.

e) A list of all Hope College students is obtained from the registrar’s office that contains information about ACT scores, GPA and class standing. These lists are used to report on the average ACT scores of Hope college students and to look at differences by GPA and class standing.

f) A list of all Hope College students is obtained from the registrar’s office that contains information about student’s gender. Based on this list, a random sample of 200 males is obtained and a random sample of 200 females is obtained and combined to make a sample of 400 students. These 400 students are surveyed about their perceptions of the Christian dimension of the college.

8. We’ve talked about lurking and confounding variables in the context of observational studies, but these variables can appear in poorly designed experiments as well and, consequently, impact the ability to make cause-effect conclusions. Reconsider the swimming with dolphins experiment. What were the people who didn’t get to swim with dolphins doing while the other got to swim with dolphins? Were they sitting around, bored, in their hotel room wondering why some people got to swim with dolphins but they didn’t? Read the following excerpt from the journal article about what the non-dolphin swimmers were doing during the study:

“In the control group, participants were assigned to an outdoor nature programme featuring the same water activities as the animal care programme but in the absence of dolphins, to control for the influence of water and other, non-specific,

environmental factors. In the outdoor nature programme, participants had to swim and snorkel in the barrier coral reef for one hour a day and had a similar degree of individualized human contact as in the animal care programme. Patients were informed of the marine ecosystem, the barrier coral reef, and water safety.

Each session took about one hour a day. To avoid disappointment for the

participants in the control group, which might have affected the results, they also had a day session with the dolphins at the end of the treatment and after the final

evaluation. Both programmes were run simultaneously and lasted for a period of two weeks for each group. The treatments were given daily, Monday to Friday.”

a) Name the ways the researchers tried to make the control group (no swimming with dolphins) as similar to the treatment group (swim with dolphins) as possible.

b) Explain how not making the groups as similar as possible can impact the ability to make a cause-effect conclusion---even if the subjects are randomly assigned to the treatments.

9. In survey research there are a number of ways the results of the study can be biased. In this context, bias means the results of the study can be systematically different than the population in a way that will not be identified through the use of tests of statistical significance. Identify each of the following situations as an example of undercoverage, non-response or response bias and explain why.

a) A simple random sample of 200 Hope College students is taken based on a list of all Hope College students obtained from the registrar. Each of the 200 students is emailed a survey, 80 students reply with their answers to the survey.

b) An issue of current debate in the realm of political polling is the increasing numbers of individuals who do not have a landline phone (instead, only having a cell phone). Traditional political polls are administered by phone and typically only landline phones are called.

c) A question on a survey to high school students given by their teacher asked if they have ever used marijuana.

Tests of Significance

10. The physician’s health study (phase I) involved over 22,000 US physician’s being randomly assigned to received aspirin or placebo. Physicians were followed and kept track of for four years, keeping record of which physicians had heart attacks in order to see if aspirin reduced the risk of heart attacks.

a) Identify the research question and explanatory/response variables.

b) State the null and alternative hypotheses for the related test of significance.

c) Ultimately, 189 out of 11,034 placebo taking physicians had a heart attack and 104 out of 11,037 aspirin taking physicians had a heart attack. Find the difference in rates of heart attack.

d) The p-value for the hypothesis test on the difference in rates of heart attack was reported as 0.0000057. What is your conclusion for this test of significance?

e) Why do you think the p-value is so small?

f) The original study was scheduled to last 10 years (1985-1995). Instead, however the study was halted after 4 years (1989). Give some of the

moral/ethical and statistical reasons why you think the study was stopped after only 4 years.

g) How do you think the p-value would have changed had the alternative hypothesis been two-sided instead of one-sided? Would it have impacted your conclusions?

11. Covering a wart with a piece of duct tape may be as effective in getting rid of it as liquid nitrogen freezing, according to an article in the October 2002 issue of the Archives of Pediatrics & Adolescent Medicine.

Researchers from Madigan Army Medical Center in Tacoma, Washington, studied 51 patients ages 3 to 22 with common warts. Twenty-six patients were treated with duct tape and 25 were treated with liquid nitrogen, or cryotherapy.

Patients in the tape group, or their parents, were told to leave the tape in place for six days, and to replace it if it fell off. After six days, they were told to remove the tape, soak the area in water, and file the wart with an emery board or pumice stone. After 12 hours without the duct tape, they were told to put a new piece on the wart, and continue the cycle for two months or until the wart was gone. Patients in the cryotherapy group received a standard application of liquid nitrogen on the wart for 10 seconds. Patients, or their parents, were told to return to the clinic every two to three weeks to repeat the freeze for a maximum of six

treatments or until the wart was gone.

The researchers found that the duct tape treatment completely removed warts in 22 of 26 patients, while the liquid nitrogen treatment removed warts in 15 of 25 patients. From these two sample proportions can we conclude that treating a wart with duct tape is better than cryotherapy?

a) State the research question.

b) State the null and alternative hypotheses for this study.

c) Identify the explanatory and response variables.

d) Is this study an experiment or observational study?

Why?

e) Find the percent of successful wart removal in each of the two treatment groups.

f) Use your answer to (e) to find the difference in proportions (duct tape minus liquid nitrogen).

g) We used fathom to do 1000 scramblings and found the difference in proportions each time. Use your answer to (f) and the table on the right to find the p-value.

h) Based on your p-value what would be your conclusion for the study?

Measures from Scrambled Collection 1

12. A study of the comparison of the proportion of boys born to smoking parents to that of nonsmoking parents was reported on April 20, 2002 by The Lancet, a British medical journal. The results of the article showed that couples who smoke around the time of conception are less likely to produce boys than those who do not. One of the statistics reported that out of 565 births where both parents smoked more than a pack a day, 255 were boys. Another statistic reported that out of 3602 births where both parents did not smoke, 1975 were boys. A p-value comparing the difference in proportions between these two groups is 0.000017.

a) What proportion of births resulted in a boy when both parents smoked more than a pack a day? What proportion of births resulted in a boy when both parents did not smoke?

b) State the null and alternative hypotheses and give your conclusion based on the p-value.

c) Identify the explanatory and response variables.

d) Is this study an observational study or an experiment? How does this impact your conclusions?

Case Study

Vitamin C: Does it improve your health?

In 1970 Linus Pauling, a well-known chemist and Nobel Prize winning scientist, published Vitamin C and the Common Cold (1970) creating a great deal of public and scientific interest. In short, Pauling argued that taking Vitamin C would reduce one’s risk of the common cold. This book almost singlehandedly made Vitamin C one of the most widely used dietary supplements, a status it retains to this day (Nutritional Supplement Review, 2009). Subsequent to the publishing of his book, Pauling wrote a paper that appeared in the Proceedings of the National Academy of Sciences in 1971. In this paper he describes a study conducted by a physician in Basel, Switzerland in the early 1960s.

Here is an excerpt from the paper explaining the study design:

“The study was carried out in a ski resort with 279 skiers during two periods of 5-7 days.

The conditions were such that the incidence of colds during these short periods was large enough (about 20%) to permit results with statistical significance to be obtained.

The subjects were roughly of the same age and had similar nutrition during the period of study. The investigation was double-blind, with neither the participants nor the

physicians having any knowledge about the distribution of the ascorbic-acid tablets (1000 mg) and the placebo tablets. The tablets were distributed every morning and taken by the subjects under observation, so that the possibility of interchange of tablets was eliminated. The subjects were examined daily for symptoms of colds and other infections. The records were largely on the basis of subjective symptoms, partially supported by objective observations (measurement of body temperature, inspection of the respiratory organs, auscultation of the lungs, and so on). Persons who showed cold symptoms on the first day were excluded from the investigation.

After the completion of the investigation, a completely independent group of professional people was provided with the identification numbers for the ascorbic-acid tablets and placebo tablets, and this group performed the statistical evaluation of the observations.”

The VitaminC.ftm file (obtain from the textbook website) is a Fathom file containing two columns of data, VitaminC (which indicates whether or not individuals received Vitamin C or the placebo) and Cold (which indicates whether or not the skier got a cold during the time of the experiment). There are 279 rows corresponding to the 279 skiers.

Note: It is expected that you will reference instructions earlier in the chapter about the specific commands in Fathom that you need to use in order to answer the following questions. Detailed Fathom instructions are not given.

1. Incidence is the term used to describe the percent of the sample who onset with an illness over a certain time period. Find the disease incidence rates for individuals who received Vitamin C and those who received the placebo.

2. What is the difference in incidence rates in the two groups? Comment on whether you think this is evidence that Vitamin C prevents colds.

3. Is this study an experiment or an observational study?

4. State the null and alternative hypotheses for a test to see if this experiment provides statistically significant evidence that Vitamin C prevented colds in the skiers. Use a

In document Chapter 2. Comparing Two Proportions: Randomization Methods (Page 39-56)