Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

1. The bell-shaped frequency curve is so common that if a population has this shape, the measurements are said to follow a __________ distribution.

**ANSWER: NORMAL**

2. With a frequency curve, to figure out what percentage or proportion of the population falls into a certain range, you have to figure out the __________ under the curve over that range.

**ANSWER: AREA**

3. A(n) __________ represents the number of standard deviations the observed value or score falls above or below the mean.

**ANSWER: STANDARD SCORE (OR Z-SCORE)**

4. For any normal curve, almost all of the values will fall within __________ of the mean.

**ANSWER: THREE STANDARD DEVIATIONS**

5. A(n) __________ is useful for displaying the relationship between two measurement variables.

**ANSWER: SCATTERPLOT**

6. A __________ can be used to represent two or three categorical variables simultaneously.

**ANSWER: BAR GRAPH**

7. The __________ between two measurement variables is an indicator of how closely their values fall to a straight line.

**ANSWER: CORRELATION**

8. If there is no linear relationship between two measurement variables, the correlation is __________.

**ANSWER: ZERO**

9. A data point that is far removed from the rest of the data is called a(n)__________.

**ANSWER: OUTLIER**

10. It is very difficult to establish a causal connection between two variables without the use of anything except a __________.

**ANSWER: RANDOMIZED EXPERIMENT**

11. A table that displays the number of individuals who fall into each combination of categorical variables is called a(n) __________ table.

**ANSWER: CONTINGENCY**

12. When omitting a third variable masks the relationship between two categorical variables, this phenomenon is called __________.

**ANSWER: SIMPSON’S PARADOX**

Part 2. For each of the following questions circle the correct response. Each question is worth 2 points.

13. Suppose you are on a jury in a trial someday. How could you encounter Simpson’s Paradox?

a. You could see data that were collected from two different studies, giving you two different results.

b. One side could present the data using two variables, and the other side could break the same data down by a third variable that reverses the direction of the results.

c. One side could use counts to summarize the data, and the other side could use percentages or rates, reversing the direction of the relationship.

d. All of the above.

**ANSWER:** **B**

14. In which case(s) should you be suspicious of a correlation that is presented?

a. When the data is likely to contain outliers.

b. When the sample size is small.

c. When removing one point in the data set actually reverses the direction of the trend.

d. All of the above
**ANSWER:** **D**

15. Assuming there is a statistical relationship between height and weight for adult females, which of the following statements is true?

a. If we knew a woman’s height, we could predict her weight.

b. If we knew a woman’s height, we could determine the exact weight for all women with that same height.

c. If we knew a woman’s height, we could predict the average weight for all women with that same height.

d. All of the above are true.

**ANSWER:** **C**

16. Most researchers are willing to declare that a relationship is statistically significant if the chances of observing the relationship in the sample when actually nothing is going in the population are less than what percent?

a. 5%

b. 50%

c. 95%

d. None of the above
**ANSWER:** **A**

17. Which of the following describes a strong statistical correlation?

a. The value of one measurement variable is always equal to the square of the value of another measurement variable.

b. One measurement variable has a cause and effect relationship with another measurement variable.

c. Two measurement variables have a strong linear relationship.

d. All of the above.

**ANSWER:** **C**

18.Suppose the correlation between two measurement variables is −1. Which of the following statements is not true?

a. As one of the variables increases, the other decreases.

b. The data looks the same as when two variables have a deterministic linear relationship.

c. The correlation between the variables is very weak.

d. All of the above statements are true.

**ANSWER:** **C**

19.Which of the following is not a type of picture for organizing categorical data?

a. A pie chart.

b. A bar graph.

c. A pictogram.

d. A histogram.

**ANSWER:** **D**

20. Which of the following describes the entire area underneath a frequency curve?

a. The entire area is 1 or 100%.

b. The entire area is equal to the total number of individuals in the population.

c. The entire area is equal to the total percentage of individuals in the population with the measurement being studied.

d. None of the above.

**ANSWER:** **A**

21.Suppose your score on the GRE (Graduate Records Exam) was at the 90^{th} percentile. What does that
mean?

a. You got 90% of the questions right.

b. 90% of the other students scored lower than you did.

c. 10% of the other students scored lower than you did.

d. None of the above.

**ANSWER:** **B**

22. Suppose one individual in a certain population had a z-score of −2. Which of the following is true?

a. This is a good thing because the individual is above average.

b. This individual’s measurement is 2 standard deviations below the mean.

c. This individual’s original measurement was a negative number.

d. All of the above are true.

**ANSWER:** **B**

Part 3. For each of the following questions give a short answer. Use complete sentences. Each question is worth 2 points.

23. Suppose you took a standardized test and the scores had a bell-shaped distribution. You only need three pieces of information in order to find your percentile in the population of test scores. What are those three pieces of information?

**ANSWER: 1) YOUR TEST SCORE; 2) THE MEAN OF THE POPULATION OF TEST**
**SCORES; AND 3) THE STANDARD DEVIATION OF THE POPULATION OF TEST**
**SCORES.**

24. The Empirical Rule says that for a normal curve, approximately 68% of the values fall within 1 standard deviation of the mean in either direction, while 95% of the values fall within 2 standard deviations of the mean in either direction. Explain why you don’t have twice as many values within 2 standard deviations as you do within 1 standard deviation.

**ANSWER: BECAUSE OF THE NORMAL, OR BELL-SHAPED CURVE. THE MAJORITY**
**(68%) FALL CLOSE TO THE MEAN, WHERE THE “BELL” PART OF THE CURVE IS.**

**AS YOU MOVE AWAY, YOU GET INTO THE TAILS OF THE CURVE, WHICH**
**CONTAIN LESS AREA.**

25. Name three types of statistical pictures that are used to represent measurement data.

**ANSWER: ANY 5 OF THE FOLLOWING ARE OK: 1) HISTOGRAM; 2) STEMPLOT; 3)**
**LINE GRAPH; 4) SCATTERPLOT; OR 5) BOXPLOT.**

26. Name a situation in which a scatterplot is most useful for displaying measurement data.

**ANSWER: 1) FOR DISPLAYING THE RELATIONSHIP BETWEEN TWO**
**MEASUREMENT VARIABLES.**

27. Determine whether or not the following statement could be statistically correct. If not, explain why not. “The correlation between tree diameter and weight of fruit harvested was found to be 2.3.”

**ANSWER: NO. CORRELATION MUST BE BETWEEN -1 AND +1.**

28. Determine whether or not the following statement could be statistically correct. If not, explain why not. “We found a strong correlation between gender and political party.”

**ANSWER: NO. CORRELATION REFERS TO TWO MEASUREMENT VARIABLES.**

29. A number of anomalies can cause misleading correlations. Name two problems that can cause distortion with correlations.

**ANSWER: 1) OUTLIERS CAN SUBSTANTIALLY INFLATE OR DEFLATE THEM; 2)**
**GROUPS COMBINED INAPPROPRIATELY MAY MASK RELATIONSHIPS.**

30.Give an example where a randomized experiment cannot be done, even though we know that is the best way to try to establish a causal connection between two measurement variables.

**ANSWER: ANY REASONABLE ANSWER OK. EXAMPLES: DOES SMOKING CAUSE**
**LUNG CANCER?**

Part 4 Make sure to show all work in the following questions!

31.GRE scores are normally distributed with a mean of 497 and standard deviation of 115.

a) (4 points) Draw a picture of the GRE scores showing the cut off values for the 99.7% of scores.

**Answer: Picture should show bell shaped curve, centered at 497, the left and right**
**ends should be marked 152 and 842 (99.7% of the area in within 3 standard**
**deviations about the mean ) **

b) (4 points) A student had a GRE score of 687. Find and interpret the standard score for this student.

*z*=687−497

115 **=1.67 The student scored 1.67 standard deviations above the mean.**

c) (4 points) Use the Empirical Rule to approximate the percentage of students with GRE scores below 382.

*z*=382−497

115 **=−1 and Empirical Rule states that about 68% of all GRE scores**
**will be within 1 standard deviation about the mean, that leaves 32% for the tails, so**
**16% of all scores are below 382 (because of the symmetry of the normal curve).**

32.A regression equation relating study time=X and exam score = Y (out of 100 points) is: Y= 21+4.5 X

a) ( 2 points) What is the score for 2 hours of study time?

**Y=21+4.5(2)=30 points**

b) (2 points) How many hours of study is required to get 93 points?

**93=21+4.5X so X=(93-21)/4.5= 16 hours**

c. (4 points) Explain clearly what meaning does the slope of 4.5 have in this situation.

**For every hour increase in study time the test score increases by 4.5 points.**

d. (4 points) Would the correlation between study time and exam score be positive or negative? Explain.

**Positive, since Y increases as X increases (slope is >0)**

33.A study examined whether giving children in developing countries large doses of vitamin A will prevent night blindness and subsequently will reduce mortality rate resulting from night blindness. 25,200 children participated in the study with the

following results: out of 12,991 who received Vitamin A, 101 were dead after 1 year and out of 12,209 that received Placebo, 130 were dead after 1 year. The rest survived.

a. ( 2 points)Which of the variables is the explanatory variable and which is the response variable?

**Survival (Dead or Alive) is response variable, Treatment (Vitamin A or Placebo) is**
**explanatory variable.**

b. ( 4 points) Construct a contingency table for the data.

** **

** SURVIVAL**
**TREATMENT:**

**Dead****Alive****total**

**Vitamin A** **101** **12890** **12991**

**Placebo** **130** **12079** **12209**

**total** **231** **24769** **25200**

c. ( 3 points) Compute the risk of deaths for each of the two treatment groups (Vitamin A and Placebo). Keep results to 4 decimal places. Interpret the result.

**VIT. A 101/12991=.00777 = .78% (rounded up)**
**PLACEBO 130/12209=.01065 = 1.1% (rounded up)**

**The risk of death is smaller for the Vit. A group, but not very much.**

d. ( 3 points) Compute the odds of surviving for each of the treatment groups (Vitamin A and Placebo)

**VIT. A : (1-.0078)/.0078=.9922/.0078 = 127.2 which gives about 127 to 1 (or **
** compute it as 12890/101 with the same result )**

**Placebo: (1-.011)/.011=.989/.011 =89.9 which gives about 90 to 1 (or compute it **
** as 12079/130 with the same result )**

e.( 3 points) Compute the relative risk of dying by Placebo group versus Vitamin A group.

Interpret the result.

**.011/.0078=1.41 Risk of dying is 1.41 times grater for Placebo group than for**
**Vitamin A group.**