Statistical Methods for the Social Sciences Fourth Edition Agresti Solutions

Full text

(1)INSTRUCTOR’S MANUAL. to accompany. STATISTICAL METHODS FOR THE SOCIAL SCIENCES Fourth Edition. Alan Agresti and Barbara Finlay. published by Pearson Education. Manual prepared by: Jackie Miller 404 Cockins Hall Department of Statistics The Ohio State University Columbus, OH 43210. Instructors: Please notify Alan Agresti of any errors in this manual or the text so they can be corrected for future printings. Please send e-mail to [email protected].. Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(2) Table of Contents 1. Introduction. 1. 2. Sampling and Measurement. 2. 3. Descriptive Statistics. 5. 4. Probability Distributions. 20. 5. Statistical Inference: Estimation. 29. 6. Statistical Inference: Significance Tests. 39. 7. Comparison of Two Groups. 50. 8. Analyzing Association Between Categorical Variables. 64. 9. Linear Regression and Correlation. 71. 10. Introduction to Multivariate Relationships. 90. 11. Multiple Regression and Correlation. 95. 12. Comparing Groups: Analysis of Variance (ANOVA) Methods. 109. 13. Combining Regression and ANOVA: Quantitative and Categorical Predictors. 116. 14. Model Building with Multiple Regression. 123. 15. Logistic Regression: Modeling Categorical Responses. 139. 16. An Introduction to Advanced Methodology. 145. Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(3) Chapter 1 1.1. (a) An individual Prius (automobile). (b) All Prius automobiles used in the EPA tests. (c) All Prius automobiles that are or may be manufactured. 1.2. (a) All 7 million voters. (b) A statistic is the 56.5% who voted for Schwarzenegger from the exit poll sample of size 2705; a parameter is the 55.9% who actually voted for Schwarzenegger. 1.3. (a) All students at the University of Wisconsin. (b) A statistic, since it’s calculated only for the 100 sampled students. 1.4. A statistic, since it is based on the approximately 1200 Floridians in the sample. 1.5. (a) All adult Americans. (b) Proportion of all adult Americans who would answer definitely or probably true. (c) The sample proportion 0.523 estimates the population proportion. (d) No, it is a prediction of the population value but will not equal it exactly, because the sample is only a very small subset of the population. 1.6. (a) The most common response was 2 hours per day. (b) This is a descriptive statistic because it describes the results of a sample. 1.7. (a) A total of 85.7% said ―yes, definitely‖ or ―yes, probably.‖ (b) In 1998, a total of 85.8% said ―yes, definitely‖ or ―yes, probably.‖ (c) A total of 74.4% said ―yes, definitely‖ or ―yes, probably.‖ The percentages of yes responses were higher for HEAVEN than for HELL. 1.8. (a) Statistics, since they’re based on a sample of 60,000 households, rather than all households. (b) Inferential, predicting for a population using sample information. 1.9. (a) 1.10. Race white black white Hispanic white. Age Sentence Felony? 19 23 38 20 41. 2 1 10 2 5. no no yes no yes. Prior Prior Arrests Convictions 2 1 0 0 8 3 1 1 5 4. 1.14. (a) A statistic is a numerical summary of the sample data, while a parameter is a numerical summary of the population. For example, consider an exit poll of voters on election day. The proportion voting for a particular candidate is a statistic. Once all of the votes have been counted, the proportion of voters who voted for that candidate would be known (and is the parameter). (b) Description deals with describing the available data (sample or population), whereas inference deals with making predictions about a population using information in the sample. For example, 1 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(4) consider a sample of voters on election day. One could use descriptive statistics to describe the voters in terms of gender, race, party, etc., and inferential statistics to predict the winner of the election. 1.15. If you have a census, you do not need to use the information from a sample to describe the population since you have information from the population as a whole. 1.16. (a) The descriptive part of this example is that the average age in the sample is 24.1 years. (b) The inferential part of this example is that the sociologist estimates the average age of brides at marriage for the population to between 23.5 and 24.7 years. (b) The population of interest is women in New England in the early eighteenth century. 1.17. (a) A statistic is the 45% of the sample of subjects interviewed in the UK who said yes. (b) A parameter is the true percent of the 48 million adults in the UK who would say yes. (c) A descriptive analysis is that the percentage of yes responses in the survey varied from 10% (in Bulgaria) to 60% in Luxembourg). (d) An inferential analysis is that the percentage of adults in the UK who would say yes falls between 41% and 49%.. Chapter 2 2.1. (a) Discrete variables take a finite set of values (or possible all nonnegative integers), and we can enumerate them all. Continuous variables take an infinite continuum of values. (b) Categorical variables have a scale that is a set of categories; for quantitative variables, the measurement scale has numerical values that represent different magnitudes of the variable. (c) Nominal variables have a scale of unordered categories, whereas ordinal variables have a scale of ordered categories. The distinctions among types of variables are important in determining the appropriate descriptive and inferential procedures for a statistical analysis. 2.2. (a) Quantitative (b) Categorical (c) Categorical (d) Quantitative (e) Categorical (f) Quantitative (g) Categorical (h) Quantitative (i) Categorical 2.3. (a) Nominal (b) Nominal (c) Interval (d) Nominal (e) Nominal (f) Ordinal (g) Interval (h) Ordinal (i) Nominal (j) Interval (k) Nominal 2.4. (a) Nominal (b) Nominal (c) Ordinal (d) Interval (e) Interval (f) Interval (g) Ordinal (h) Interval (i) Nominal (j) Interval 2.5. (a) Interval (b) Ordinal (c) Nominal 2.6. (a) State of residence. (b) Number of siblings. (c) Social class (high, medium, low). (d) Student status (full time, part time). (e) Number of cars owned. (f) Time (in minutes) needed to complete an exam. (g) Number of siblings.. 2 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(5) 2.7. (a) Ordinal, since there is a sense of order to the categories. (b) Discrete. (c) These values are statistics since them come from a sample. 2.8. Ordinal. 2.9. (b), (c), (d) 2.10. (a), (c), (e), (f) 2.11. Students numbered 10, 22, 24. 2.12. Number names 00001 to 52000. First five that are selected are 15011, 46573, 48360, 39975, 06907. 2.13. Observational study (b) Experiment (c) Observational study (d) Experiment 2.14. (a) Experimental study, since the researchers are assigning subjects to treatments. (b) An observational study could look those who grew up in nonsmoking or smoking environments and examine incidence of lung cancer. 2.15. (a) Sample-to-sample variability causes the results to vary. (b) The sampling error for the Gallup poll is –2.4% for Gore, 0.1% for Bush, and 1.3% for Nader. 2.16. (a) This is a volunteer sample because viewers chose whether to call in. (b) Randomly sample the population. 2.17. The first question is confusing in its wording. The second question has clearer wording. 2.18. (a) Skip number is k = 52,000/5 = 10,400. Randomly select one of the first 10,400 names and then skip 10,400 names to get each of the next names. For example, if the first name picked is 01536, the other four names are 01536 + 10400 = 11936, 11936 + 10400 = 22336, 22336 + 10400 = 32736, 32736 + 10400 = 43136. (b) We could treat the pages as clusters. We would select a random sample of pages, and then sample every name on the pages selected. Its advantage is that it is much easier to select the sample than it is with random sampling. A disadvantage is as follows: Suppose there are 100 ―Martinez‖ listings in the directory, all falling on the same page. Then with cluster sampling, either all or none of the Martinez families would end up in the sample. If they are all sampled, certain traits which they might have in common (perhaps, e.g., religious affiliation) might be over-represented in the sample. 2.19. Draw a systematic sample form the student directory, using skip number k = 5000/100 = 50. 2.20. (a) This is not a simple random sample since the sample with necessarily have 40 women and 40 men. A simple random sample may or may not have exactly 40 men and 40 women. (b) This is stratified random sampling. You ensure that neither men nor women are over-sampled. 3 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(6) 2.21. (a) The clusters. (b) The subjects within every stratum. (c) The main difference is that a stratified random sample uses every stratum, and we want to compare the strata. By contrast, we have a sample of clusters, and not all clusters are represented—the goal is not to compare the clusters but to use them to obtain a sample. 2.22. (a) Categorical are GE, VE, AB, PI, PA, RE, LD, AA; quantitative are AG, HI, CO, DH, DR, NE, TV, SP, AH. (b) Nominal are GE, VE, AB, PA, LD, AA; ordinal are PI and RE; interval are AG, HI, CO, DH, DR, NE, TV, SP, AH. 2.24. (a) Draw a systematic sample from the student directory, using skip number k = N/100, where N = number of students on the campus. (b) High school GPA on a 4-point scale, treated as quantitative, interval, continuous; math and verbal SAT on a 200 to 800 scale, treated as quantitative, interval, continuous; whether work to support study (yes, no), treated as categorical, nominal, discrete; time spent studying in average day, on scale (none, less than 2 hours, 2-4 hours, more than 4 hours), treated as quantitative, ordinal, discrete. 2.25. This is nonprobability sampling; certain segments may be over- or under-represented, depending on where the interviewer stands, time of day, etc. Quota sampling fails to incorporate randomization into the selection method. 2.26. Responses can be highly dependent on nonsampling errors such as question wording. 2.27. (a) This is a volunteer sample, so results are unreliable; e.g., there is no way of judging how close 93% is to the actual population who believe that benefits should be reduced. (b) This is a volunteer sample; perhaps an organization opposing gun control laws has encouraged members to send letters, resulting in a distorted picture for the congresswoman. The results are completely unreliable as a guide to views of the overall population. She should take a probability sample of her constituents to get a less biased reaction to the issue. (c) The physical science majors who take the course might tend to be different from the entire population of physical science majors (perhaps more liberal minded on sexual attitudes, for example). Thus, it would be better to take random samples of students of the two majors from the population of all social science majors and all physical science majors at the college. (d) There would probably be a tendency for students within a given class to be more similar than students in the school as a whole. For example, if the chosen first period class consists of college-bound seniors, the members of the class will probably tend to be less opposed to the test than would be a class of lower achievement students planning to terminate their studies with high school. The design could be improved by taking a simple random sample of students, or a larger random sample of classes with a random sample of students then being selected from each of those classes (a two-stage random sample). 2.28. A systematic sample with a skip number of 7 (or a multiple of 7) would be problematic since the sampled editions would all be from the same day of the week (e.g., Friday). The day of the week may be related to the percentage of newspaper space devoted to news about entertainment. 2.29. Because of skipping names, two subjects listed next to each other on the list cannot both be in the sample, so not all samples are equally likely.. 4 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(7) 2.30. If we do not take a disproportional stratified random sample, we might not have enough Native Americans in our sample to compare their views to those of other Americans. 2.31. If a subject is in one of the clusters that is not chosen, then this subject can never be in the sample. Not all samples are equally likely. 2.33. The nursing homes can be regarded as clusters. A systematic random sample is taken of the clusters, and then a simple random sample is taken of residents from within the selected clusters. 2.34. (b) 2.35. (c) 2.36. (c) 2.37. (a) 2.38. False. This is a convenience sample. 2.39. False. This is a voluntary response sample. 2.40. An annual income of $40,000 is twice the annual income of $20,000. However, 70 degrees Fahrenheit is not twice as hot as 35 degrees Fahrenheit. (Note that income has a meaningful zero and temperature does not.) IQ is not a ratio-scale variable.. Chapter 3 3.1. (a) Place of Birth Relative Frequency Europe 13.7% Asia 25.4% Caribbean 9.6% Central America 37.6% South America 6.1% Other 7.6%. 5 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(8) (b) 40.00. Percent. 30.00. 20.00. 10.00. 0.00 C America. Asia. Europe. Caribbean. Other. Place of Birth. (c) ―Place of birth‖ is categorical. (d) The mode is Central America. 3.2. (a) Religion Relative Frequency Christianity 41.2% Islam 25.5% Hinduism 17.6% Confucianism 7.8% Buddhism 7.8%. 6 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall. S America.

(9) (b). 50.00. Relative Frequency. 40.00. 30.00. 20.00. 10.00. 0.00 Christianity. Islam. Hinduism. Buddhism. Confucianism. Religion. (c) The mode of these five religions is Christianity. Christianity is also the mode of all religions. 3.3. (a) There are 33 students. The minimum score is 65, and the maximum score is 98. Histogram (b). 12.5. Frequency. 10.0. 7.5. 5.0. 2.5 Mean =82.88 Std. Dev. =8.947 N =33 0.0 60. 70. 80. 90. 100. Midterm_Score. 7 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(10) 3.4. (a) Number Persons Relative Frequency 1 27.1% 2 33.3% 3 16.0% 4 13.8% 5 or more 9.8% (b) 40.00. Relative Frequency. 30.00. 20.00. 10.00. 0.00 1. 2. 3. 4. 5 or more. Number of Persons. (c) The median household size is 2 persons, and the mode is also 2 persons. 3.5. (a). Valid. 1. Frequency 3. 6.0. Cumulative Percent 6.0. 2. 8. 16.0. 22.0. 3. 9. 18.0. 40.0. 4. 2. 4.0. 44.0. 5. 8. 16.0. 60.0. 6. 9. 18.0. 78.0. 7. 5. 10.0. 88.0. 8. 2. 4.0. 92.0. 9. 2. 4.0. 96.0. 10. 1. 2.0. 98.0. 13. 1. 2.0. 100.0. 50. 100.0. Total. Percent. 8 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(11) (b). Histogram. 10. Frequency. 8. 6. 4. 2 Mean =4.8 Std. Dev. =2.571 N =50 0 0. 2.5. 5. 7.5. 10. 12.5. MU_noDC. The distribution appears to be bimodal and skewed to the right. (c) Stem 1 2 3 4 5 6 7 8 9 10 11 12 13. Leaves 000 00000000 000000000 00 00000000 000000000 00000 00 00 0 0. The stem-and-leaf plot shows the same bimodality and right skew that the histogram does. 3.6. (a) GDP is rounded to the nearest thousand Stem (10 thousands) 2 2 3 3 4 4 5 5 6 6 7. Leaves (thousands 023 58899 00011122233 8. 0. 9 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(12) (b). Histogram. 12.5. Frequency. 10.0. 7.5. 5.0. 2.5 Mean =32.00 Std. Dev. =9.615 N =23 0.0 20.00. 30.00. 40.00. 50.00. 60.00. 70.00. RoundGDP. (c) The outlier in each plot is Luxembourg. 3.7. (a) The mean is (26 + 17 + 236 + 2 + 6)/5 = 287/5 = 57.4 abortions per 1000 women 15 to 41 years of age. (b) The median is 17 abortions per 1000 women 15 to 41 years of age. The mean and median are so different because California is an extreme outlier in this small data set. 3.8. (a) The mean is (0.3 + 1.8 + 2.3 + 1.2 + 1.4 + 0.7 + 9.9 + 20.1)/8 = 37.7/8 = 4.7 metric tons per person. The median is 1.6 metric tons per person. (b) The United States appears to be an outlier, since it is far greater than any other data value. (Without the United States, the mean is 2.5 and the median is 1.4.) 3.9. (a) The response ―not far enough‖ is the mode. (b) We cannot compute and mean or median with these data since they are categorical. 3.10. (a) Stem 0 1 2 3 4. Leaves 4679 133 0 9 4. (b) The mean is 16.6 days, and the median is 12 days. (c) Leaves 25 years ago 875 440 21 0 5. Stem 0 1 2 3 4 5. Leaves 4679 133 0 9 4. For the data from 25 years ago, the mean was 27.6 days, and the median was 24 days. The mean has decreased by 11 days, and the median has decreased by 12 days since 25 years ago. (d) Of the 10 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(13) 11 observations, the median is 13 days. We cannot calculate the mean, but substituting 40 for the censored observation gives a mean of 18.7 days. 3.11. (a) TV Hours Frequency Relative Frequency 0 79 4.0 1 422 21.2 2 577 29.0 3 337 17.0 4 226 11.4 5 136 6.8 6 99 5.0 7 23 1.2 8 34 1.7 9 4 0.2 10 23 1.2 12 14 0.7 13 1 0.1 14 7 0.4 15 2 0.1 18 2 0.1 24 1 0.1 Total 1987 100.0 (b) The distribution is unimodal and right skewed. (c) The median is the 994th data value, which is 2. (d) The mean is larger than 2 because the data is skew right by a few high values. 3.12. Central America 8540 85210 82. Stem 4 5 6 7 8 9. Western Europe 488 003678 1268 567 0. Female economic activity seems greater, on average, in Western Europe than in Central America. Most of the values in Western Europe exceed the highest value in Central America. There appear to be more women in the labor force (per 100 men) in Western Europe than in Central America. 3.13. Since the mean is much greater than the median, the distribution of 2000 household income in Canada is most likely skewed to the right. 3.14. (a) The median is ―2 or 3 times a month.‖ The mode is ―not at all.‖ The data are centered around the respondents having sex about 2 or 3 months in the past 12 months. The most frequent answer to the question is ―not at all.‖ (b) The sample mean is 4.1, which means that, on average, the respondents had sex about 4 times a month in the past 12 months. 3.15. (a) The mode is ―every day.‖ The median is ―a few times a week.‖ (b) The mean is 3.7 times per week, which is lower than the 4.4 times a week in 1994.. 11 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(14) 3.16. (a) For each gender, the distribution of earnings is skewed to the right, since each mean is greater than its respective median. (b) The overall mean income is ($39,890×73.8 + $56,724×83.4)/(73.8 + 83.4) = $7674663.6/157.2 = $48,821. 3.17. (a) The response variable is median family income, and the explanatory variable is race. (b) We cannot find the median income for the combined groups since we do not know how many families are in each group. (c) We would need to know how many families were in each group. 3.18. (a) The distribution is skewed to the right. (b) The Empirical Rule only applies to bellshaped distributions, so it does not apply here. (c) The median is 0. If the 500 observations were to shift from 0 to 6, the median would remain zero, since half of the data values fall below 0 and half fall above 0. This illustrates the resistance of the median to skewness and extreme values. 3.19. (a) Median: $10.13; mean: $10.18; range: $0.46; standard deviation: $0.22. (b) Median: $10.01; mean: $9.17; range: $5.31; standard deviation: $2.26. The median is resistant to outliers, but the mean, range, and standard deviation are highly impacted by outliers. 3.20. (a) Mean: 30; standard deviation: 9.0. (b) Minimum: 13; lower quartile: 25.5; median: 31; upper quartile: 36; maximum: 42. 3.21. The mean is 28.7, and the standard deviation is 12.5. The 2006 HDI ratings for the top 10 nations vary greatly. 3.22. (a) The life expectancies in Africa vary more than the life expectancies in Western Europe, because the life expectancies for the African countries are more spread out than those for the Western European countries. (b) The standard deviation is 1.1 for the Western European nations and 7.1 for the African nations. 3.23. (a) (i) $40,000 to $60,000; (ii) $30,000 to $70,000; (iii) $20,000 to $80,000. (b) A salary of $100,000 would be unusual because it is 5 standard deviations above the mean. 3.24. (a) Approximately 68% of the values are contained in the interval 32 to 38 days; approximately 95% of the values are contained in the interval 29 to 41 days; all or nearly all of the values are contained in the interval 26 to 44 days. (b) (i) The mean would decrease if the observation for the U.S. was included. (ii) The standard deviation would increase if the observation for the U.S. was included. (c) The U.S. observation is 5.3 standard deviations below the mean. 3.25. (a) 88.8% of the observations fall within one standard deviation of the mean. (b) The Empirical Rule is not appropriate for this variable, since the data are highly skewed to the right. 3.26. 10 is realistic; –20 is impossible since the standard deviation cannot be negative; 0 implies that every student scored 76 on the exam, which is highly improbable; 50 is too large (it is half of the possible range of scores). 3.27. (a) The most realistic value is 0.4, because the range is 5 times the length of this value. (b) The value of –10.0 is impossible since the standard deviation cannot be negative. 3.28. (d). 12 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(15) 3.29. (a) Since the range is 43.5 standard deviations above the mean, the distribution is most likely skewed to the right. (b) The distribution probably has outliers (take the maximum usage, for example). 3.30. The distribution is most likely skewed to the right since the minimum water consumption (0 thousands of gallons) is less than one standard deviation below the mean. 3.31. (a) The range is $28,700, which is the difference between the mean salary for secondary school teachers in Illinois (highest mean) and in South Dakota (lowest mean). (b) The interquartile range is $9600 and represents the spread of the mean salaries for the middle 50% of the states. 3.32. (a). Salary. 600 00. 500 00. 400 00. (b) The box plot suggests that the data are skewed to the right. (c) 7000 is the most plausible standard deviation, since the range of the data is about 4 standard deviations. The values 100 and 1000 are too small for the spread that we see, and 25,000 is just slightly over the value for the range. 3.33. The mean, standard deviation, maximum, and range all decrease, because the observation for D.C. was a high outlier. Note that these statistics are not resistant to outliers. On the other hand, the median, Q3, Q1, the interquartile range, and the mode remain the same, as these are all resistant to outliers. The minimum remains the same since D.C. was a high outlier and not a low outlier. 3.34. (a) The Empirical Rule does not apply to this distribution because the standard deviation is much larger than the mean, suggesting a right-skewed distribution. (b) The five-number summary confirms that the distribution is skewed to the right, since the distance between Q3 and the medians is larger than the distance between the median and Q1 and the maximum is so large. (c) IQR = Q3 – Q1 = 1105 – 256 = 849. Low outliers would be observations less than Q1 – 1.5(IQR) = 256 – 1.5(849) = –1017.5. There are no values that are low outliers. High outliers would be observations greater than Q3 + 1.5(IQR) = 1105 + 1.5(849) = 2378.5. At least the maximum is a high outlier.. 13 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(16) 3.35. (a) The sketch should show a right-skewed distribution. (b) The sketch should show a rightskewed distribution. (c) The sketch should show a left-skewed distribution. (d) The sketch should show a right-skewed distribution. (e) The sketch should show a left-skewed distribution. 3.36. (a) Skewed to the left (b) Bell shaped (c) Skewed to the right (d) Skewed to the left (e) Skewed to the left (f) Skewed to the right (g) U shaped 3.38. A box plot using only the five-number summary follows: . EU unemployme nt. 12.0. 10.0. 8.0. 6.0. 4.0. This box plot shows us that the maximum is an outlier. Since the mean and the median are the same, the distribution may be slightly more symmetric than the box plot implies. It is important to note that only the five-number summary was used to produce this box plot. 3.39. (a) Minimum = 0, Q1 = 20, median = 30, Q3 = 50, maximum = 14. (b) Same as part (a). (c) The observations with values 12 and 14 are outliers. (d) The standard deviation is 3. The value 0.3 is too small for the distribution, the value 13 is almost equal to the range, and the value 23 exceeds the range. 3.40. Side-by-side box plots are not very informative as shown below:. 14 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(17) The infant mortality rates in Africa are much higher than the infant mortality rates in Western Europe. In addition, since Q1, the median, and Q3 are all equal for Western Europe, we do not actually see a ―box‖ in the box plot. Infant mortality rates in Africa are skewed to the right, while infant mortality rates in Western Europe are symmetric. 3.41. (a). No health insura nce. 25.0. 20.0. 15.0. 10.0. (b) The distribution appears to be skewed to the right. 3.42. (a) The range is 92.3 – 78.3 = 14. The interquartile range is 88.8 – 83.6 = 5.2. (b) Low outliers would be observations less than Q1 – 1.5(IQR) = 83.6 – 1.5(5.2) = 75.8. There are no values that are low outliers. High outliers would be observations greater than Q3 + 1.5(IQR) = 88.8 + 1.5(5.2) = 96.6. There are no values that are high outliers. 3.43. (a) Minimum = 1, Q1 = 3, median = 5, Q3 = 6, maximum = 13. (b) . 12.5. MU_noDC. 10.0. 7.5. 5.0. 2.5. Louisiana appears to be a mild outlier. (c) Minimum = 1, Q1 = 3, median = 5, Q3 = 6, maximum = 44.. 15 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(18) . 40. MU. 30. 20. . 10. 0. Louisiana is still a mild outlier, and D.C. is an extreme outlier. Only the maximum changes in the five-number summary when the observation for D.C. is added to the data set. 3.44. The IQR is most likely 350. The value of 1500 is the range. The value of –10 is impossible for an IQR, and the 0 would only occur if all data values were the same. Given the minimum, median, and maximum, the value 10 is too small to be the IQR. 3.45. (a) Luxembourg’s observation is 3.9 standard deviations above the mean. (b) Sweden’s observation is 0.8 standard deviations below the mean. (c) (i) Canada’s observation is 2.5 standard deviations above the mean of the EU countries. (ii) The U.S. observation is 3.6 standard deviations above the mean of the EU countries (but not as high as Luxembourg). 3.46. (a) Italy’s observation is 0.4 standard deviations below the mean. (b) The U.S. observation is 3.4 standard deviations above the mean of the EU countries. (c) We expect almost all of the values in a bell shaped distribution to be within 3 standard deviations of the mean. Thus, the U.S. observation would be considered a high outlier. 3.47. (a) Response variable: opinion about national health insurance (favor, oppose); explanatory variable: political party (Democrat, Republican). (b) The data could be summarized in a contingency table with political party as the rows and opinion about national health insurance as the columns. 3.48. (a) Response variable: happiness; explanatory variable: religious attendance. (b) For those who attend religious services nearly every week or more, 44.5% reported being very happy. For those who attend religious services never or less than once a year, 23.2% reported being very happy. (c) There appears to be an association between happiness and religious attendance since the percentages that reported being very happy differed greatly by attendance at religious services. 3.49. (a) United States: predicted fertility = 3.2 – 0.04(50) = 1.2; Yemen: predicted fertility = 3.2 – 0.04(0) = 3.2. (b) The negative value implies that the fertility rate decreases as Internet use increases.. 16 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(19) 3.50. (a) Points in a scatterplot for these data should have a negative association and be fairly tightly clustered in a linear pattern. (b) Contraceptive use is more strongly associated with fertility than is Internet use because –0.89 is a stronger linear association than is –0.55. 3.51. (a) Based on the plot (see next page), the correlation should be positive, since higher values of GDP tend to go with higher values of CO2 (and vice versa). (b) Luxembourg has a GDP of 69,961 and CO2 of 22.0, both of which are extreme values. 25.0. CO2. 20.0. 15.0. 10.0. 5.0 10,000. 20,000. 30,000. 40,000. 50,000. 60,000. 70,000. GDP. 3.52. The number of physicians is more strongly correlated with carbon dioxide emissions than is female economic activity, since the absolute value of the correlation for number of physicians and CO2 emissions is closer to 1 than is the correlation for female economic activity and CO2 emissions. 3.53. (a) y is a sample statistic (sample mean) used to estimate the population mean µ. (b) s is a sample statistic (sample standard deviation) used to estimate the population standard deviation . 3.54. (a) The mean is 1232.2 miles, with standard deviation 1681.7 miles. The median is 640 miles. The minimum distance from home is 0 miles, while the maximum distance is 8000 miles. The histogram shows that the distribution of the distance from home is skewed to the right.. 17 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(20) Histogram. 40. Frequency. 30. 20. 10. Mean =1232.2 Std. Dev. =1681.748 N =60 0 0. 2000. 4000. 6000. 8000. DHome. (b) The mean is 7.3 hours, with standard deviation 6.7 hours. The median is 6 hours. The minimum from home is 0 hours,Histogram while the maximum is 37 hours. The histogram shows that the distribution of the hours of watching television is skewed to the right. 20. Frequency. 15. 10. 5. Mean =7.27 Std. Dev. =6.717 N =60 0 0.0. 10.0. 20.0. 30.0. 40.0. TVhours. 3.56. Report should include graphical displays and summary statistics. The summary statistics are: mean = 2.7, standard deviation = 2.1, minimum = 0.1, Q1 = 1.5, median = 2.1, Q3 = 3.6, maximum = 9.4. The U.S. is an outlier with 9.4 gun deaths per 100,000 people. 3.57. Report should state that the explanatory variable is the percentage with income below the poverty level and the response variable is the violent crime rate. The correlation coefficient is 0.496. D.C. appears to be an outlier.. 18 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(21) 2,000. Violent Crime Rate. 1,500. 1,000. 500. 0 6.0. 8.0. 10.0. 12.0. 14.0. 16.0. 18.0. 20.0. Poverty Level. 3.59. The distribution of cost for New York and Boston are similar, and both cities have high and low outliers. The distributions for all three cities are roughly symmetric. The distribution for cost in London is higher than the distributions in both New York and Boston, with 75% of the costs in London being higher than all costs in Boston and almost all costs in New York. 3.60. All associations are positive. Quality of food rating and service rating have the highest correlation of 0.81, which is fairly strong. Quality of food rating is moderately positively associated with décor rating (0.61) and with cost rating (0.53). 3.62. The mean salary is $7,095,078. Salaries are typically right-skewed. There will be a few very high paid players and more ―modestly‖ paid players. 3.63. The median overall new worth is $86,100. The distribution of overall net worth will be skewed to the right, so the median will be less than the mean. 3.64. The median is not impacted by gains made by the wealthiest Americans because the wealthiest Americans are at the high end of net worth, and the median is calculated from values at the center of the data. 3.65. When comparing countries to each other, the mean number of children makes sense. For example, the fertility rate in Ireland is similar to the fertility rate in the U.S. The fertility rate of Mexico is almost twice that of Italy or Spain. We are looking at rates across the countries and not the number of children each individual woman in the country has. 3.66. (a) The median male height is between 69 and 70 inches. (b) A rough approximation for the standard deviation can be found by dividing the range by 6, since almost all of the data in a bellshaped distribution falls within 3 standard deviations of the mean. Thus, the standard deviation is approximately 20/6 = 3.3 inches. 3.67. One example is the type of pet that people prefer. The mode might be ―dog,‖ and it makes no sense to talk about the mean or the median. 19 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(22) 3.68. (a) Heights (in inches) of adult women. (b) Incomes in a large city. (c) Scores on an easy exam. (d) Heights (in inches) of adults in the U.S. (e) Number of cigarettes smoked in a week. 3.69. (a) The median is preferred over the mean when the data are skewed and/or there are outliers that will affect the mean. One example is incomes in the large city. (b) The mean is preferred over the median when the distribution is very highly discrete, such as the number of times you have been married. 3.70. (a) The standard deviation s is generally preferred over the range because it is calculated from all of the data and will not be impacted as much as the range when there are outliers. (b) The IQR is preferred to the standard deviation s when the distribution is very highly skewed, because the IQR is more robust to skewness than s is. 3.71. (a) False (b) False (c) True (d) True 3.72. (c) 3.73. (c) 3.74. (a) 3.75. The standard deviation is incorrectly recorded. It exceeds the range of the data. 3.76. Florida has the larger overall mean income ($35,100) compared to Alabama ($29,600). 3.77. Population sizes vary by state; the overall rate gives more weight to states with larger population sizes, whereas the mean of the 50 measurements gives the same weight to each state. 3.78. (a) The mean is now 77, while the standard deviation stays at 20. (b) The mean is 200,000£, and the standard deviation is 60,000£. 3.79..  y  y    y   y   y  n   y n    y   y  0 i. i. i. i. i. i. 3.80. (a) 1/4, 1/9, 1/100. (b) 25% maximum versus 5% in bell-shaped case. Most distributions have much less than 25% of the distribution falling more than two standard deviations from the mean..  y  c  0 implies that  y  nc or c    y . 3.81. f   c   2. i. i. i. n y.. Chapter 4 4.1. 907/1127 = 0.805 4.2. (a) 1 – 0.85 = 0.15 (b) (0.85)(0.84) = 0.714 4.3. (a) 96 is the total number of members of an environmental group, and 1117 is the total number of subjects who answered both questions. (b) (i) 30/96 = 0.312 (ii) 88/1021 = 0.086. (c) (i) 30/1117 = 0.027 (ii) (0.086)(0.312) = 0.027. (d) (30 + 933)/1117 = 963/1117 = 0.862. 20 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(23) 4.4. (a) The number of languages in which a person is fluent (y) is discrete, since it must take on integer values. (b) y 0 1 2 Probability 0.02 0.81 0.17 yP  y   0  0.02 1 0.81  2  0.17   1.15 (c) 0.02 + 0.81 = 0.83. (d). . 4.5. (a) The probabilities of each y value need to be taken into account (the mean is a weighted yP  y   0  0.91 1 0.06  2  0.02  3 0.01  0.13 average). (b). . 4.6. y 0 1,000,000 Probability 0.9999999 0.0000001.  yP  y   0 0.9999999 1,000,000 0.0000001  0.10 or $0.10 4.7. (a). (b). . y 0 1 2 3 4 5 6 7 8 9 Probability 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 yP  y   0  0.10 1 0.10  9  0.10  0.45 (c) The standard deviation  is 2.9,. since 0.4 is too small to be the standard deviation, 7.0 is too large to be the standard deviation (almost equal to the range), and 12.0 is an impossible value for the standard deviation (exceeds the range). 4.8. (a) P  Z  1  1  P  Z  1  1  0.8413  0.1587. (b) P  Z  1  0.1587. (c) P  Z  0.67  1  P  Z  0.67   1  0.7486  0.2514. 4.9. (a) P      X       P  1  Z  1  0.8413  0.1587  0.6826. (b) P   1.96  X    1.96   P  1.96  Z  1.96  0.9750  0.0250  0.95. (c) P    3  X    3   P  3  Z  3  0.9987  0.0013  0.9974. (d) P    0.67  X    0.67   P  0.67  Z  0.67  0.7486  0.2514  0.4972 4.10. (a) 2.33 (b) 1.96 (c) 1.64 (d) 1.28 (e) 0.67 (f) 0 4.11. (a) 0.67 (b) 1.64 (c) 1.96 (d) 2.33 (e) 2.58 4.12. (a) 1.28 (b) 1.64 (c) 2.06 (d) 2.33 4.13. If the interval   z to   z contains 90% of a normal distribution, there is 5% below   z and 5% above   z . Thus,   z equals the 95th percentile. 4.14. (a) (i) 75th percentile (ii) 25% percentile. (b) z = 0.67. (c) Plug z = 0.67 into the equations in part (a) to get that   0.67 is the upper quartile of a normal distribution and   0.67 is the lower quartile of a normal distribution. 21 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(24) 4.15. (a) 0.0179 (b) 0.0179 (c) 0.9821 (d) 0.9642 4.16. Right-tail probability of 0.01 has z = 2.33. 4.17. (a) The 98th percentile is 2.05 standard deviations above the mean. (b) The IQ score for the 98th percentile is 100 + 2.05(16) = 132.8, or about 133. 4.18. 40 hours per week has z = (40 – 45)/15 = –0.33, and is 0.33 standard deviations below the mean. The proportion of self-employed individuals who averaged more than 40 hours per week is 0.6293. 4.19. (a) An MDI of 120 has z = (120 – 100)/16 = 1.25, and is 1.25 standard deviations above the mean. The proportion of children with an MDI of 120 or more is 0.1056. (b) The MDI score that is the 90th percentile is 1.28 standard deviations above the mean, so this score is 100 + 1.28(16) = 120.48, or 120. (c) The lower quartile is 0.67 standard deviations below the mean, which gives a lower quartile of 100 – 0.67(16) = 89.28, or 89. Similarly, the upper quartile is 0.67 standard deviations above the mean, which gives an upper quartile of 100 + 0.67(16) = 110.72, or 111. Since the MDI scores are approximately normal, the median will be equal to the mean of 100. 4.20. (a) 258 days has z = (258 – 281.9)/11.4 = –2.10, and is 2.10 standard deviations below the mean. The proportion of babies that would be born prematurely is 0.0179. (b) Since 0.036 (the actual proportion of premature babies) is twice the proportion we would expect if gestation times were normally distributed, the actual distribution is probably skewed to the left, so the left-tail probability more than 2.1 standard deviations below the mean is larger than the right-tail probability more than 2.1 standard deviations above the mean. 4.21. (a) 20 gallons per week has z = (20 – 16)/5 = 0.8, and is 0.8 standard deviations above the mean. The proportion of adults who use more than 20 gallons per week is 0.2119. (b) The 95 th percentile is 1.645 standard deviations above normal. We need to solve for x where 1.645 = (x – 16)/5. The value of x is 24.225. So, the mean would need to be about 24.2 gallons so that only 5% of adults use more than 20 gallons per week. (c) If the distribution of gasoline use is not actually normal, we should expect it to be right-skewed, since there will be fewer adults with high gasoline usages that will cause the distribution to be right-skewed. 4.22. 80 has z = (80 – 83)/5 = –0.6, and is 0.6 standard deviations below the mean. 90 has z = (90 – 83)/5 = 1.4, and is 1.4 standard deviations above the mean. The proportion of students who earn a B is approximately 0.9192 – 0.2743 = 0.6449. 4.23. An SAT score of 600 is (600 – 500)/100 = 1.0 standard deviations above the mean. An ACT score of 29 is (29 – 21)/4.7 = 1.70 standard deviations above the mean. Relatively speaking, an ACT score of 29 is higher than an SAT score of 600. 4.24. (a) z = (5000 – 2500)/1500 = 1.67. (b) About 0.0475 (4.75%) of the property taxes exceed $5000. (c) If the true distribution is not normal, it is probably skewed to the right. There will be a few very expensive homes that have high property taxes that will cause the distribution to be right-skewed. 4.25. (a) 1000 kWh is z = (1000 – 673)/556 = 0.59 standard deviations above the mean. If the distribution were normal, about 27.76% of the households had use above 1000 kWh. (b) The distribution is probably right skewed, due to a few very large homes that have high electricity usage. 22 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(25) 4.26. (a) The probability distribution is y 0 1 2 Probability 0.30 0.60 0.10 (b) The sampling distribution of the sample proportion p of the students selected who are female is p 0 0.5 1 Probability 0.30 0.60 0.10 4.27. (a) The sampling distribution of the sample proportion of heads for flipping a balanced coin once is p 0 1 Probability 0.50 0.50 (b) The sampling distribution of the sample proportion of heads for flipping a balanced coin twice is p 0 0.5 1 Probability 0.25 0.50 0.25 (c) The sampling distribution of the sample proportion of heads for flipping a balanced coin three times is p 0 1/3 2/3 1 Probability 0.125 0.375 0.375 0.125 (d) The sampling distribution of the sample proportion of heads for flipping a balanced coin four times is p 0 0.25 0.50 0.75 1 Probability 0.0625 0.25 0.375 0.25 0.0625 (e) As the number of flips increases, the sampling distribution of the sample proportion of heads seems to be getting more normal, with the probabilities concentrating more around 0.50. 4.28. (a) The 36 possible pairs are (1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (3,1), (3,2), (3,3), (3,4), (3,5), (3,6), (4,1), (4,2), (4,3), (4,4), (4,5), (4,6), (5,1), (5,2), (5,3), (5,4), (5,5), (5,6), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6). (b) The sampling distribution for the sample mean y is p 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Probability 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 (c) (i) The histogram of the probability distribution for each roll is uniform. (ii) The shape is triangular, but starting to approach a bell shape, compared to the uniform distribution for Y. (d) The mean of the probability distribution for each roll is. 1 1 1 1 1 1   Y  P Y   1   2    3   4    5    6    3.5 . 6. 6. 6. 6. 6. 6. The mean of the probability distribution for y is.    y  P  y   1. 1 2 3  2 1  1.5    2      5.5    6    3.5 .   36   36   36   36   36 . The means of the two distributions are the same because they are both symmetric about the mean of 3.5. (e) There are more  y1 , y2  pairs that have a sample mean closer to the true mean for the average of two rolls than for the value of one roll. The spread of the probability distribution for y is less than that for the probability distribution of a single roll.. 23 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(26) 4.29. (a). y .  n. . 0.5  0.0104 . (b) If actually 50% of the population voted for DeWine, 2293. it would be surprising to obtain 44% in this exit poll, since 44% is 6% lower than 50%, and the standard error for the sampling distribution is 1.04%; that is, the sample proportion of 0.44 is nearly 6 standard errors below 0.50. (c) Based on the information from the exit poll, I would be willing to predict that Sherrod Brown would win the Senatorial election. 4.30. (a) The mean is 13.6, and the standard error is and the standard error is. y .  n. . y .  n. . y .  n. . 3.0  1.0 . (b) The mean is 13.6, 9. 3.0  0.5 . (c) The mean is 13.6, and the standard error is 36. 3.0  0.3 . As n increases, the standard error decreases by a factor of n . 100. 4.31. (a) The mean is 0.10, and the standard error is. y .  n. . 316.23  0.32 . (b) It is 1,000,000. very unlikely to come out ahead. The z-score for $1 is (1 – 0.10)/0.32 = 2.15. The probability of winning more than $1 is therefore 0.0158, which is 1.58%. 4.32. (a) y does not have a normal distribution since the standard deviation is the same as the mean. This implies that y has a distribution that is skewed to the right. (b) The sampling distribution of y is approximately normal with mean 1.1 and standard error. y .  n. . 1.1  0.0225 . (c) The sample mean would almost surely fall within 3 standard 2400. errors of the sample mean, which is the interval 1.03 to 1.17. 4.33. (a) The probability that PDI is below 90 is. 90 100   P Y  90  P  Z   P  Z  0.67   0.2514 . 15  . (b) The probability that the sample mean PDI is below 90 is.  90 100  P Y  90  P  Z    P  Z  3.33  0.00135 . 15 25   (c) An individual PDI of 90 is not surprising, since the probability is 0.2514. However, a sample mean PDI of 90 would be surprising since this value would happen almost never. (d) The sketch of the sampling distribution should be less spread out and have a taller peak and thinner tails than the sketch of the population distribution. 4.34. (a) P. . .   10  Y   10  P  20010100  Z  20010100   P  0.5  Z  0.5 ..   The probability is then 0.6815  0.3085  0.3730 . (b) If the actual population standard deviation is larger than 200, the probability would be smaller than the probability found in part (a).. 24 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(27) 4.35. (a) The variable y is the number of people in a household in the U.S. (b) The center of the population distribution is 2.6 people, with standard deviation 1.5 people. (c) The center of the sample data distribution is 2.4 people, with standard deviation 1.4 people. (d) The center of the sampling distribution of the sample mean for 225 homes is 2.6 people, with standard error. 225  0.1 . This distribution describes the theoretical distribution for the sample mean.. 1.5. 4.36. (a) The population distribution is skewed to the right with mean 5.2 and standard deviation 3.0. (b) The sample data distribution based on the sample of 36 families and is skewed to the right with mean 4.6 and standard deviation 3.2. (c) The sampling distribution of y is approximately normal with mean 5.2 and standard error 3.0. 36  0.5 . This distribution describes the. theoretical distribution for the sample mean.. 4.37. (a).  0.5 0.5  P    0.5  Y    0.5  P  Z    P  1  Z  1 . This 3.0 36   3.0 36. probability is 0.8413 – 0.1587 = 0.6826. (b) P. . . . .    0.5  Y    0.5  P  3.00.5100  Z  3.00.5100   P  1.67  Z  1.67 . This. probability is 0.9525 – 0.0475 = 0.9050. The probability is larger than in part (a), because the standard error is smaller (since the sample size is larger). (c) If the sample were truly random, then the probability that y would be 4.0 or less is.  4.0  5.2  P Y  4.0  P  Z    P  Z  4  0.0000317 . This would be a surprising 3.0 100   result. 4.38. (a) Let y = 1 if the student is female and y = 0 if the student is male. (b) The population. distribution of gender at this university has P Y  1  0.60 and P Y  0  0.40 . (c) The sample data distribution of gender has P Y  1  0.52 and P Y  0  0.48 . (d) In a random sample of size 50, we expect the sampling distribution of the sample proportion of females in the sample to be approximately normal with mean 0.60 and standard error 0.07. 4.39. (a) The population distribution is skewed to the left with mean 60 years and standard deviation 16 years. (b) The sample data distribution is skewed to the left with mean 58.3 years and standard deviation 15.0 years. (c) The sampling distribution of y is approximately normal with mean 60 years and standard error 1.6 years. This distribution specifies probabilities for the possible values of y for all the possible samples. (d) The probability of observing a person of. age 40 in Sunshine City is. 40  60   P Y  40  P  Z   P  Z  1.25  0.1056 . 16   The probability of observing a sample mean of 40 for a random sample of size 100 is 25 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(28)  40  60  P  y  40  P  Z    P  Z  12.5  0 . 16 100   Thus, it is not unusual to observe an individual of age 40 in Sunshine City, but it is very unusual to observe a random sample of size 100 in Sunshine City with an average age of 40. 4.40. (a) The sampling distribution of y for a random sample of size n = 1 is exactly the same as the population distribution. (b) If you sample all 50,000 residents in Sunshine City, there will not be a sampling distribution, and you will know that the population mean is 60 years and the population standard deviation is 16 years. 4.41 (b) Even though the population distribution is not normal (there are only two possible values), the sample proportions for the 1000 samples of size 100 each should have a histogram with an approximately bell shape. 4.42 (a) The population distribution is skewed, but the empirical distribution of sample means probably has a bell shape, reflecting the Central Limit Theorem. (b) The Central Limit Theorem applies to relatively large random samples, but here n = 2 for each sample. 4.44. (a) A stem-and-leaf plot of the population is 2|3 2|578899 3|0123 3|6677899 4|011233 4|556789 5|0014 5|677 6|02234 6|6778 7|01 7|6 8|1 (c) The mean of the y -values in a long run of repeated samples of size 9 should be approximately 47.18. (c) The standard deviation of the y -values in a long run of repeated samples of size 9 should be approximately 4.9. 4.45. (a) The probability distribution is y 0 1 Probability 0.50 0.50 The mean is 0.5. (d) (i) The mean should be 0.5. (ii) The standard deviation should be 0.16. 4.46. (a) The sample data distribution tends to resemble the population distribution more closely than the sampling distribution. A random sample of data from a population should be 26 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(29) representative of the population, and its distribution should be similar to the population distribution. (b) The sample data distribution is the distribution of data that we actually observe. The sampling distribution of y is the probability distribution for the possible values of the sample statistic y . 4.47. (a) A lower bound for the mean is.    yP  y   1 0.01  2  0.10  3 0.09  4  0.31  5  0.19  6 0.29  4.41 .. (b) Since we know the category of ideal number of children that falls at the 50% point, we can find the median. The median is 4 children. 4.48. (a). P  y    0.67   P  z  0.67  0.7514 Thus, the upper quartile equals.   0.67 . (b) The IQR for a normal distribution is   0.67     0.67   1.34 . This gives us 1.5(IQR) = 1.51.34  2.01 . The 1.5(IQR) criterion would tell us that an outlier lies above.   0.67  2.01    2.68 , which is about 2.7 standard deviations above the. mean. The probability that data from a normal distribution would fall above this point is. P  y    2.7   P  z  2.7  0.0035 . Note that outliers will also fall below   2.7 ,. which has area 0.0035 (by symmetry). Therefore, only 0.7% of data are outliers using the 1.5(IQR) criterion. 4.49. The standard error for the poll, assuming that the true proportion is 0.5 is. y . 0.5 0.5   0.014 , or 1.4%. Since the 67% statistic from the exit poll is more than 12 n 1336. standard deviations above the expected mean, I would be willing to predict that Hillary Clinton would be the winner of the 2006 Senatorial election in New York based on this exit poll.. 0.5 0.5   0.05 . The interval 0.35 to 0.65 is the interval within n 100 0.5  0.016 . which the sample proportion is almost certain to fall. When n = 1000,  y  1000 4.50. When n = 100,. y . The interval 0.453 to 0.547 is the interval within which the sample proportion is almost certain to fall. When n = 10,000,. y . 0.5  0.005 . The interval 0.485 to 0.515 is the interval within 10,000. which the sample proportion is almost certain to fall. 4.51. (a) 4.52. (c). 27 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(30) 4.53. False. As the sample size increases, the standard error of the sampling distribution of y decreases, since. y . . decreases as n increases.. n.  . 4.54. (a) Group A: P  y  400  P  z . 400  500   P  z  1  0.1587 . Almost 16% of 100 . students from Group A are not admitted to Lake Wobegon Junior College. Group B:. 400  450   P  y  400  P  z   P  z  0.5  0.3085 . Almost 31% of students from 100   Group B are not admitted to Lake Wobegon Junior College. (b) Of the students who are not admitted, 0.3085/(0.3085 + 0.1587) = 0.3085/0.4672 = 0.6603, or about 66%, are from Group B. (c) If the new policy is implemented, the proportion of students from Group A that are not admitted would be 0.0228, while the proportion of students from Group B that are not admitted would be 0.0668. In this case, about 75% of the students who are not admitted would be from Group B. Relatively speaking, this policy would hurt students from Group B more than the current policy. 4.55. (a). .  y    P  y   0  0.5 0.5  1 0.5 0.5  2. 2. 2. 0.25  0.5 . (b).    yP  y   0 1   1    ;.  0   2 1    1  2     2   3    2 2   3     2   1   .. . (c). The standard error for a sample proportion for a random sample of size n is.  n. .  1   . 4.56.. f    c . n. .  1    n. .. f    c . Since. 2   c     2 2   c 2 2 2 1 1 e  e   2 2. and. 2   c    2 2   c 2 2 2 1 1 e  e   are equal. Thus, f    c   f    c  2 2. and the curve is symmetric.. 30,000  300 30,000 1  0.99  0.995 . (b) If n = N, the finite population correction is  N  N   N 1  0 , so  y  0 . (c) When n 4.57. (a) The finite population correction is. = 1, the finite population correction is.  N 1  N 1  1 , so  y .  n. .  1.   . Thus, the. sampling distribution of y and its standard error are identical to the population distribution and its standard deviation.. 28 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(31) Chapter 5 5.1. ˆ  412,878 577,006  0.716. 2  3  2  1  0  1  4  3 16   2.0 hours per day spent watching TV. (b) The 8 8 s 1.309 margin of error is t  2.365  1.10 . This represents the amount that is added to and n 8 5.2. (a) y . subtracted from the point estimate to form a confidence interval.. 5.3. The estimated standard error is. 5.4. The estimated standard error is. 5.5. The margin of error is z. ˆ 1  ˆ  n. ˆ 1  ˆ . ˆ 1  ˆ  n. n. . 0.74  0.26  0.017 . 644. . 0.54  0.46  0.011 . 2003. 0.51 0.49  0.031 , or 3.1%. 1008.  1.96. 5.6. (a) Democrat: ˆ  90 142  0.634 ; Republican: ˆ  26 102  0.255 . (b) We are 95% confident that the population proportion of yes responses falls in the interval 0.55 to 0.71 for Democrats. We are also 95% confidence that the population proportion of yes responses falls in the interval 0.17 to 0.34 for Republicans. It appears that more Democrats feel that the government is responsible for reducing the income differences between the rich and the poor.. 5.7. (a) The estimated standard error in 2004 is margin of error is z. ˆ 1  ˆ  n. ˆ 1  ˆ  n. . 0.36  0.64  0.016 . (b) The 833.  1.96  0.016  0.03 , or 3%. (c) The 95% confidence interval. is 0.36 – 0.03 = 0.33 to 0.36 + 0.03 = 0.39. We are 95% confident that the population proportion of people agreeing that it is much better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family falls in the interval 0.33 to 0.39. 5.8. ˆ  366 598  0.612 The 99% confidence interval is. ˆ  z. ˆ 1  ˆ  n.  0.612  2.576. 0.612  0.388  0.56 to 0.66 . 598. 5.9. (a) ˆ  229 1200  0.191 The 95% confidence interval is 29 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(32) ˆ  z. ˆ 1  ˆ  n.  0.191  1.96. 0.191 0.809  0.17 to 0.21 . 1200. We are 95% confidence that the interval 0.17 to 0.21 contains the population proportion of those who believe that environmental regulations are too strict. (b) The 99% confidence interval is. 0.191  2.576. 0.191 0.809  0.16 to 0.22 . We are 99% confidence that the interval 16% to 1200. 22% contains the population proportion of those who believe that environmental regulations are too strict. Note that the 99% confidence interval is wider than the 95% confidence interval. 5.10. A 99% confidence interval would be wider because we need to have more possible values in the interval in order to be more confident. 5.11. (a) 2.326 (b) 1.645 (c) 0.67 (d) 3.0 5.12. (a) ―Sample prop‖ = 1885/2815 = 0.6696. (b) Since we are 95% confident that the interval 0.652 to 0.687 contains the population proportion of American adults who are in favor of the death penalty and the entire interval exceeds 50%, it is reasonable to conclude that more than half of all American adults are in favor of the death penalty. (c) A 95% confidence interval for the proportion of American adults who opposed the death penalty is 0.313 to 0.348. 5.13. (a) The proportion that said legal is 0.364; the proportion that said not legal is 0.636. (b) The 95% confidence interval is 0.364  1.96. 0.364  0.636  0.331 to 0.397 . We are 95% 802. confident that the interval 0.331 to 0.397 contains the population proportion that thinks marijuana should be made legal. Since this interval is entirely below 50%, we can conclude that a minority of Americans felt this way. (c) The proportion that said marijuana should be legal dropped until 1990 and has increased each year since.. 5.14. The 99% confidence interval is 0.538  2.576. 0.538  0.462  0.499 to 0.577 . We are 1095. 99% confident that the interval 49.9% to 57.7% contains the population proportion of those who think that it is probably or definitely not true that human beings developed from earlier species of animals. Since this interval dips slightly below 50%, we cannot conclude that a majority of Americans felt this way.. 5.15. The 99% confidence interval is 0.255  2.576. 0.255  0.745  0.250 to 0.260 . We are 42,000. 99% confidence that the interval 0.25 to 0.26 contains the population proportion of smokers.. 30 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(33) 5.16. The 95% confidence interval is 0.565  1.96. 0.565  0.435  0.546 to 0.584 . We are 2705. 95% confidence that the interval 0.546 to 0.584 contains the population proportion of those who voted for Schwarzenegger. Assuming that the sample used for the exit poll is representative of the population of voters, there is enough evidence to conclude that Schwarzenegger would win the election, since the entire confidence interval exceeds 0.50. 5.17. (a) The 99% confidence interval is 0.40  2.576. 0.40  0.60  0.337 to 0.463 . Since 400. the entire confidence interval is below 50%, it would appear that Jones would not win the election. (b) The 99% confidence interval is 0.40  2.576. 0.40  0.60  0.20 to 0.60 . Since 40. the confidence interval contains 0.50, there appears to be no clear winner. The smaller sample size makes the confidence interval so wide that it is almost useless. (Note that a sample size that is too big will make a confidence interval so narrow that it is almost useless as well—see, for example, Exercise 5.15.) 5.18. If the sample size had been one-fourth as large, the confidence interval would be twice as wide and would be 0.23 to 0.31. 5.19. (a) 2.776 (b) 2.145 (c) 2.064 (d) 2.060 (e) 2.787 5.20. (a) The 95% confidence interval is y  t. s 10  70  2.776  57.6 to 82.4 . We are n 5. 95% confident that the interval 57.6 to 82.4 contains the population mean. (b) The 95% confidence interval is y  t. s 10  70  2.093  65.3 to 74.7 . We are 95% confident that n 20. the interval 65.3 to 74.7 contains the population mean.. 5.21. (a) The standard error is. s 52.554   1.656 . (b) We are 95% confident that the interval n 1007. 21.5 to 28.0 contains the population mean number of female partners males have had sex with since their eighteenth birthday. (c) The mean is quite high compared to the median and the mode, which means that there were a few male respondents with a very large number of female sex partners. In addition, the standard deviation is more than twice the mean, confirming the right skew of the distribution of the number of female sex partners. A confidence interval based on the mean does not seem to be the best idea.. 5.22. (a) The point estimate is 3.02 children. (b) The standard error is. s 1.81   0.081 . (c) n 497. We are 95% confident that the interval 2.9 to 3.2 contains the population mean ideal number of. 31 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(34) children for a family to have. (d) Since the confidence interval is entirely above 2.0, it does not seem plausible that the population mean equals 2.0 children.. 5.23. (a) The standard error is. y t. s 1.77   0.089 . (b) The 95% confidence interval is n 397. s 1.77  2.89  1.96  2.7 to 3.1. We are 95% confident that the interval 2.7 to 3.1 n 397. contains the population mean ideal number of children that males think a family should have. The ―95% confidence‖ means that we have constructed this confidence interval in such a way that 95% of the 95% confidence intervals would contain the true population mean. 5.24. (a). y. 11  11  6  9  14  3  0  7  22  5  4  13  13  9  4  6  11 124   7.29 ; 17 17. s.  y  y  n 1. 2.  51.596  7.18 . (b) The standard error is. s 7.18   1.74 . (c) The tn 17. score that is in the df = 16 row and t0.025 column is 2.120. (d) The 95% confidence interval is. y t. s 7.18  7.29  2.120  3.6 to 11.0 . We are 95% confident that the interval 3.6 to 11.0 n 17. pounds contains the population mean change in weight for this therapy. 5.25. A confidence is not about any one subject or about 95% of the subjects, it is an interval estimate for our population parameter. The correct interpretation is that we are 95% confident that the interval 2.60 to 2.93 hours is the population mean number of hours of TV watched on the average day. 5.26. (a) y  1.93 , s = 1.53, se =. 1.53  0.396 . (b) The 95% confidence interval is 15. 1.93  2.145  0.396  1.1 to 2.8 . We are 95% confident that the interval 1.1 to 2.8 hours contains the population mean daily hours watching TV spent by Buddhists. (c) Since the standard deviation is almost as large as the mean, it appears that the population distribution is not normal. The t procedures are robust against violations of normality, but we should still be cautious about the validity of the results since the sample size is so small. 5.27. (a) The population distribution is probably not normal, since the standard deviation is almost equal to the mean (which implies that the population distribution is skewed to the right). (b) Because our sample size is so large, the t procedures will be robust against this violation of normality. The 99% confidence interval is 20.3  2.576. 18.2  19.1 to 21.5 . We are 99% 1415. 32 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(35) confident that the interval 19.1 to 21.5 contains the population mean length that residents have lived in the city, town, or community where they live now. 5.28. (a) The 95% confidence interval is 1.81  1.96. 1.98  1.67 to 1.95 . We are 95% 816. confident that the interval 1.67 to 1.95 contains the population mean number of days in the past 7 days that women have felt sad. (b) Since the standard deviation is larger than the mean, the variable is most likely skewed to the right. Since t procedures are robust against violations of normality and our sample size is large, our findings in part (a) are probably okay, unless there are extreme outliers. 5.29. (a) We are 95% confident that the interval 1.09 to 1.18 contains the population mean number of sex partners that people had in the previous 12 months. (b) The standard deviation is almost as large as the mean, indicating that the distribution is probably skewed to the right. Since t procedures are robust against violations of normality and our sample size is large, our findings in part (a) are probably okay, unless there are extreme outliers. 5.30. (a) We are 95% confident that the interval 3.32 to 4.88 contains the mean number of times a week University of Florida students read a newspaper. (b) Since the standard deviation is almost as large as the mean, the distribution is probably skewed to the right. (c) ―Robust‖ means that the t procedures are not affected by violations against normality where other procedures may be. This means that our calculations and interpretations are probably okay, unless there are extreme outliers. 5.31. (a) The 99% confidence interval is 4.23  2.576. 1.39  4.13 to 4.33 . (b) (i) A 95% 1294. confidence interval would be narrower, since we do not need to include as many possible values in the confidence interval when we are less confident. (ii) A 99% confidence interval for only the strong Democrats would be wider, since the standard deviation is larger and the sample size is smaller (thus causing the standard error to be larger). (c) When we use the sample mean and standard deviation, we are assuming that the measurement scale for political ideology is interval scale. 5.32. (a) y = 1.5 days. (b) The 95% confidence interval is 1.5  1.96. 2.21 = 1.4 to 1.6. We are 1450. 95% confident that the interval 1.4 to 1.6 contains the population mean number of days in the past 7 days that people have felt lonely. 5.33. (a) Based on the stem-and-leaf plot, the population distribution might be skewed to the right.. 33 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(36) 5|69 6|04 7|003789 8|33456 9|022346 10|008 11|12 12|024 13|9 (b) y  90 , s = 20.67 (c) The 95% confidence interval is 90  2.045. 20.67  82.3 to 97.7 . We 30. are 95% confident that the interval $8230 to $9770 contains the population mean annual family income for families living in public housing in Chicago. 5.34. (a) The confidence interval is 4.3 to 6.3. We are 95% confident that the interval 4.3 to 6.3 days contains the population mean length of stay for all inpatients in that hospital. (b) If the administrator wants the confidence interval to be half as wide, she needs to take a random sample of 400 records. 2. 2.  z   1.645  5.35. n   1      0.30  0.70     156.89 The necessary sample size is 157. M   0.06  2. 2.  z   1.96  5.36. (a) n   1      0.50  0.50     600.25 The necessary sample size is M   0.04  600. (b) To have a margin of error of 0.02, they should sample 4(600.25) = 2401 people. 2. 2.  z   1.96  5.37. (a) n   1       0.10  0.90    864.36 , or 864. (b) Using   0.50 , n = M   0.02  2401. This sample size is a little less than three times the sample size in part (a). If we can make an educated guess about the value of π, we can collect a smaller sample size; the closer π is to 0 or 1, the smaller the sample size needed. 2.  1.96    1534.2 The sample size was about 1534.  0.025 . 5.38. n  0.48  0.52 . 2.  1.96  5.39. n  0.83  0.17     602.3 The sample size was about 602.  0.03 . 34 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(37) 2. 2.  z   1.96   5002     96.04 The sample size should be about 96 farms. (b) M   100  s 300  1.96  60.0 . The margin of error is actually z n 96. 5.40. (a) n   2 . 5.41. We estimate the standard deviation to be (18 – 0)/6 = 3. The sample size calculation is 2. 2.  z   1.96  n      32    34.57 , so a sample of size 35 is needed. M   1  2. 5.42. We estimate the standard deviation to be (50,000 – 6,000)/4 = 11,000. The sample size 2. 2.  z   2.576  calculation is n      11,0002    802.9 , so a sample of size 803 is needed. M   1000  2. 5.43. We cannot use the ordinary large-sample formula because 3 < 15. An appropriate confidence. 0.147  1.96. interval. uses. ˆ  5 34  0.147 and the 95% confidence interval is. 0.147  0.853  0.028 to 0.266 . We are 95% confident that the interval 0.028 34. to 0.266 contains the population proportion of children who died before reaching adulthood. 5.44. (a) ˆ  0 5  0 , se . ˆ 1  ˆ  n. . 0 1  0 . (b) Since the number in each category (0 5. like tofu and 5 do not like tofu) is less than 15, we cannot use the large-sample formula for a confidence interval. An appropriate confidence interval uses ˆ  2 9  0.222 and the 95% confidence interval is 0.222  1.96. 0.222  0.778  0.0 to 0.49 . We are 95% confident that 9. the interval 0 to 0.49 contains the population proportion of students who like tofu. 5.45. For n = 30, the endpoints of a 95% confidence interval have indices. n 1 30  1  n  30  15.5  5.5  10 to 21. 2 2 The confidence interval consists of the 10th smallest and 21st smallest values. The 95% confidence interval is 79 to 96. We are 95% confident that the interval $7900 to $9600 contains the population median income of the public housing residents. 5.46. For n = 54, the endpoints of a 95% confidence interval have indices. n 1 54  1  n  54  27.5  7.3  20.2 to 34.8 . 2 2. 35 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

(38) The confidence interval consists of the 20th smallest and 35th smallest values. The 95% confidence interval is 2 to 6. We are 95% confident that the interval 2 to 6 years contains the population median time since a book was last checked out. 5.49. (a) SPSS gives us the following output: Statistic Mean. 7.267. 95% Confidence Interval for Mean. Lower Bound Upper Bound. Std. Error .8672. 5.531 9.002. We are 95% confident that the mean weekly number of hours spent watching TV is between 5.5 and 9.0 hours. (b) SPSS gives us the following output: Statistic AfterLife. Proportion. 0.52. 95% Confidence Interval for Proportion. Lower Bound. Std. Error 0.065. 0.39. Upper Bound. 0.65. We are 95% confident that the proportion of students who believe in life after death is between 0.39 and 0.65. 5.52. The report should include the following elements. The explanatory variable is political ideology, and the response variable is whether a person lived with her/his husband/wife before they got married. About 33% of politically liberal respondents lived with their spouse prior to marriage, compared to about 16% of politically conservative respondents. A 95% confidence interval for politically liberal respondents is 26.1% to 40.2%. A 95% confidence interval for politically conservative respondents is 11.6% to 20.1%. Note that the confidence interval for politically conservative respondents is entirely below the confidence interval for politically liberal respondents. 5.53. The report should include the following elements. About 15% of subjects agree that women should take care of running their homes and leave running the country up to men (the 95% confidence interval is 13.4% to 16.7%). About 34% of subjects agree that it is better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and the family (the 95% confidence interval is 32% to 36.3%). About 42.4% of subjects agree that a preschool child is likely to suffer if her mother works (the 95% confidence interval is 40.1% to 44.7%). 5.54. A 95% confidence interval for the population mean is 0 to 30.4 (actually, the lower bound is –10.43, but we report this as 0). Outliers have a tremendous impact on confidence intervals for means, since they affect both the mean and the standard deviation.. 36 Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall.

No results found