• No results found

SAS Statistics Quizzes

N/A
N/A
Protected

Academic year: 2021

Share "SAS Statistics Quizzes"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

Quiz, Lesson 1: Introduction to Statistics

Select the best answer for each question. When you are finished, click Submit Quiz.

For an asymmetric (or skewed) distribution, which of the following statistics is a good measure for the middle of the data?

a. mean b. median

c. either mean or median

Which of the following code examples correctly calculates descriptive statistics of popcorn yield (Yield) for each level of the class variable (Type) in the data set Statdata.Popcorn, as well as statistics for all levels combined?

The output should include the following statistics: sample size, mean, median, standard deviation, variance, range, and interquartile range.

a. proc means data=statdata.popcorn maxdec=2 fw=10

n mean median std var range qrange;

class Type; var Yield; run;

b. proc means data=statdata.popcorn maxdec=2 fw=10

printalltypes

n mean median std var range qrange;

class Yield; var Class; run;

c. proc means data=statdata.popcorn maxdec=2 fw=10

printalltypes

n mean median std var range qrange;

class Type; var Yield; run;

d. proc means data=statdata.popcorn maxdec=2 fw=10

printalltypes

n mean median std range IQR;

(2)

class Type; var Yield; run;

1. Read the following statement about the central limit theorem and choose the answer that contains the correct values for all of the missing fields:

The central limit theorem states that the distribution of sample __(1)__ is approximately __(2)__, regardless of the distribution of the population data, as long as the sample size is at least n = __(3)__.

a. means, skewed, 20 b. variance, equal, 30 c. means, normal, 30 d. proportions, equal, 10

2. Psychologists at a college want to know if students are sleeping more or less than the recommended average of 8 hours a day.

Which of the following code choices correctly tests the null hypothesis? a. proc univariate data=statdata.sleep mu0<>8;

var hours; run;

b. proc univariate data=statdata.sleep; var hours / mu0=8;

run;

c. proc univariate data=statdata.sleep; var hours / mu0<>8;

run;

d. proc univariate data=statdata.sleep mu0=8; var hours;

run;

(3)

a. the measure of the ability of the statistical hypothesis test to reject the null hypothesis when it is actually false

b. the probability of committing a Type I error

c. the probability of failing to reject the null hypothesis when it is actually false

4. Select the choice that lists only continuous variables.

a. body temperature, number of children, gender, beverage size b. age, body temperature, gas mileage, income

c. number of children, gender, gas mileage, income d. gender, gas mileage, beverage size, income

5.

6. Which of the following code choices creates a histogram for the

variable Speed from the data set SpeedTest with a normal curve overlay and a box with the skewness and kurtosis statistics printed in the northeast corner?

a. proc univariate data=statdata.speedtest;

histogram Speed / normal(mu=est sigma=est); inset skewness kurtosis;

run;

b. proc univariate data=statdata.speedtest; histogram Speed / normal;

inset skewness kurtosis / position=ne; run;

c. proc univariate data=statdata.speedtest;

histogram Speed / normal(mu=est sigma=est); inset skewness kurtosis / position=ne; run;

d. proc univariate data=statdata.speedtest;

histogram Speed / normal(skewness kurtosis); run;

7. Select the statement below that incorrectly interprets a 95% confidence interval (15.02, 15.04) for the population mean, if the sample mean is 15.03 ounces of cereal.

(4)

a. You are 95% confident that the true average weight for a box of cereal is between 15.02 and 15.04 ounces.

b. The probability is .95 that the true average weight is between 15.02 and 15.04 ounces.

c. In the long run, approximately 95% of the intervals calculated with this procedure will capture the true average weight.

.

8. The shape of a normal distribution depends on the value of which two parameters?

a. the e x d the st d rd devi ti s b. the st d rd devi ti σ d the v ri ce σ² c. the e µ d the st d rd devi ti σ d. none of the above

9. The standard error of the mean is

a. used to calculate confidence intervals of the mean. b. always normally distributed.

c. sometimes less than 0. d. none of the above

(5)

Quiz Feedback, Lesson 1: Introduction to Statistics

Your Score:

90%

Congratulations! Your score of 90% indicates that you've mastered the topics in this lesson. If you'd like, check the feedback below and select Review links for any sections containing topics you want to review.

When you're ready to start the next lesson, select the lesson in the Course Menu. Then click a topic in the Lesson Menu to begin.

1. For an asymmetric (or skewed) distribution, which of the following statistics is a good measure for the middle of the data?

a. mean b. median

c. either mean or median

Your answer: b Correct answer: b

The median is not affected by outliers and is less affected by the skewness. The mean, on the other hand, averages in any outliers that might be in your data.

Review: Measures of Location

Which of the following code examples correctly calculates descriptive statistics of popcorn yield (Yield) for each level of the class variable (Type) in the data set Statdata.Popcorn, as well as statistics for all levels combined?

The output should include the following statistics: sample size, mean, median, standard deviation, variance, range, and interquartile range.

(6)

maxdec=2 fw=10

n mean median std var range qrange;

class Type; var Yield; run;

b. proc means data=statdata.popcorn maxdec=2 fw=10

printalltypes

n mean median std var range qrange;

class Yield; var Class; run;

c. proc means data=statdata.popcorn maxdec=2 fw=10

printalltypes

n mean median std var range qrange;

class Type; var Yield; run;

d. proc means data=statdata.popcorn maxdec=2 fw=10

printalltypes

n mean median std range IQR; class Type; var Yield; run; Your answer: c Correct answer: c

The PROC MEANS statement must include the option PRINTALLTYPES in order for SAS to display statistics for all requested combinations of class variables – that is, for each level or occurrence of the variable and for all occurrences combined. The statistics specified on the second line must include the keywords N MEAN MEDIAN STD VAR RANGE QRANGE. The code must specify Type as the class variable and Yield as the analysis variable.

Review: The MEANS Procedure

3. Read the following statement about the central limit theorem and choose the answer that contains the correct values for all of the missing fields:

The central limit theorem states that the distribution of sample __(1)__ is approximately __(2)__, regardless of the distribution of the population data, as long as the sample size is at least n = __(3)__.

(7)

a. means, skewed, 20 b. variance, equal, 30 c. means, normal, 30 d. proportions, equal, 10 Your answer: c Correct answer: c

The central limit theorem states that the distribution of sample means is approximately normal, regardless of the distribution of the population data, as long as the sample size is at least n = 30.

Review: Normality and the Central Limit Theorem

4. Psychologists at a college want to know if students are sleeping more or less than the recommended average of 8 hours a day.

Which of the following code choices correctly tests the null hypothesis? a. proc univariate data=statdata.sleep mu0<>8;

var hours; run;

b. proc univariate data=statdata.sleep; var hours / mu0=8;

run;

c. proc univariate data=statdata.sleep; var hours / mu0<>8;

run;

d. proc univariate data=statdata.sleep mu0=8; var hours;

run;

Your answer: d Correct answer: d

You specify the MU0= option as part of the PROC UNIVARIATE statement to indicate the test value of the null hypothesis. The alternative hypothesis is th t μ is t equ l t 8 hours, but this does not need to be specified in the PROC UNIVARIATE code.

Review: Using PROC UNIVARIATE to Generate a t Statistic

5. How do you define the term power?

a. the measure of the ability of the statistical hypothesis test to reject the null hypothesis when it is actually false

(8)

c. the probability of failing to reject the null hypothesis when it is actually false

Your answer: a Correct answer: a

Power is the ability of the statistical test to detect a true difference, or the ability to successfully reject a false null hypothesis. The probability of committing a Type I err r is α. The pr b bility f f ili g t reject the ull hyp thesis whe it is ctu lly false is a Type II error.

Review: Types of Errors and Power

6. Select the choice that lists only continuous variables.

a. body temperature, number of children, gender, beverage size b. age, body temperature, gas mileage, income

c. number of children, gender, gas mileage, income d. gender, gas mileage, beverage size, income

Your answer: b Correct answer: b

The continuous variables are age, body temperature, gas mileage, and income. Review: Types of Variables: Quantitative and Categorical

7. Which of the following code choices creates a histogram for the

variable Speed from the data set SpeedTest with a normal curve overlay and a box with the skewness and kurtosis statistics printed in the northeast corner?

a. proc univariate data=statdata.speedtest;

histogram Speed / normal(mu=est sigma=est); inset skewness kurtosis;

run;

b. proc univariate data=statdata.speedtest; histogram Speed / normal;

inset skewness kurtosis / position=ne; run;

c. proc univariate data=statdata.speedtest;

histogram Speed / normal(mu=est sigma=est); inset skewness kurtosis / position=ne; run;

d. proc univariate data=statdata.speedtest;

histogram Speed / normal(skewness kurtosis); run;

(9)

Your answer: c Correct answer: c

In the HISTOGRAM statement, you specify the Speed variable and the NORMAL option using estimates of the population mean and the population standard deviation. In the INSET statement, you specify the keywords SKEWNESS and KURTOSIS, as well as the POSITION=NE option.

Review: The UNIVARIATE Procedure

8. Select the statement below that incorrectly interprets a 95% confidence interval (15.02, 15.04) for the population mean, if the sample mean is 15.03 ounces of cereal.

a. You are 95% confident that the true average weight for a box of cereal is between 15.02 and 15.04 ounces.

b. The probability is .95 that the true average weight is between 15.02 and 15.04 ounces.

c. In the long run, approximately 95% of the intervals calculated with this procedure will capture the true average weight.

.

Your answer: c Correct answer: b

A 95% confidence interval means that you are 95% confident that the interval contains the true population mean. If you sample repeatedly and calculate a confidence interval for each sample mean, 95% of the time your confidence interval will contain the true population mean. A confidence interval is not a probability. When a confidence interval is calculated, the true mean is in the interval or it is not. There is no probability associated with it.

Review: Confidence Intervals

9. The shape of a normal distribution depends on the value of which two parameters?

a. the e x d the st d rd devi ti s b. the st d rd devi ti σ d the v ri ce σ² c. the e µ d the st d rd devi ti σ d. none of the above

Your answer: c Correct answer: c

(10)

The shape of a normal distribution depends on the value of two parameters, the e µ d the st d rd devi ti σ .

Review: Normal Distribution

10. The standard error of the mean is

a. used to calculate confidence intervals of the mean. b. always normally distributed.

c. sometimes less than 0. d. none of the above

Your answer: a Correct answer: a

The standard error of the mean is part of the equation used to calculate a confidence interval of the mean. It is not normally distributed, and it is never less than 0.

Review: Point Estimators, Variability, and Standard Error, Interval Estimators

Quiz, Lesson 2: Analysis of Variance (ANOVA)

Select the best answer for each question. When you are finished, click Submit Quiz.

1. Given this SAS output, is there sufficient evidence to reject the assumption of equal variances?

Quiz, Lesson 3: Regression

Select the best answer for each question. When you are finished, click Submit Quiz.

1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?

Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0

(11)

Performance 1.00000 -0.82049 <.0001 -0.71257 <.0001 RunTime -0.82049 <.0001 1.00000 0.19523 0.2926 Age -0.71257 <.0001 0.19523 0.2926 1.00000 2.

a. a fairly strong, positive linear relationship b. a fairly strong, negative linear relationship c. a fairly weak, positive linear relationship d. a fairly weak, negative linear relationship

3. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse,

and Maximum_Pulse, the parameter estimate for Age is -2.78. What does this mean? a. For each year older, the predicted value of oxygen consumption is 2.78 greater. b. For each year older, the predicted value of oxygen consumption is 2.78 lower. c. For every 2.78 years older, oxygen consumption doubles.

d. For every 2.78 years younger, oxygen consumption doubles.

4. Given the information in this summary of variable selection, which stepwise selection method was specified in the PROC REG step?

Step Variable Entered Variable Removed Number Vars In Partial R-Square Model R-Square C(p) F Value Pr > F 1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001 2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222 5. a. FORWARD

(12)

b. BACKWARD c. STEPWISE

d. can't tell from the information given

6. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent variable when Performance equals 55, what are the lower and upper prediction limits for this newly sampled individual value?

Output Statistics

Obs Name Performance Dependent

Variable Predicted Value Std Error Mean Predict 95% CL Mean 95% CL Predict Residual 1 Jack 48 40.8400 44.9026 1.0190 42.0732 47.7319 37.4190 52.3861 -4.0626 2 Annie 43 45.1200 45.3793 1.3081 41.7475 49.0112 37.5570 53.2016 -0.2593 3 Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021 0.5149 4 Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700 0.4146 5 Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977 0.6610 6 Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009 2.7314 7. a. 44.7500 and 44.2351 b. 40.1023 and 48.3678 c. 36.1680 and 52.3021

d. can't tell from the information given

8. Which of the following statements describes a positive linear relationship between two variables?

1. The more I eat, the less I want to exercise

2. The more salty snacks I eat, the more water I want to drink. 3. No matter how much I exercise, I still weigh the same.

(13)

b. 1 and 2 c. 2 only d. 2 and 3 e. 3 only

What output does this program produce?

proc corr data=statdata.bodyfat2 nosimple plots=matrix(nvar=all histogram); var Age Weight Height;

run;

a. individual correlation plots and simple descriptive statistics b. a scatter plot matrix only, with histograms along its diagonal

c. a table of correlations and a scatter plot matrix with histograms along its diagonal d. can't tell from the information given

9. How many of the following models meet Mallows' Cp criterion for model selection?

Model Index

Number in

Model C(p) R-Square Variables in Model

1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist

2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist

3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist

4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist

5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist

(14)

b. 1 c. 3 d. 5

In this PROC SCORE step, which option specifies the data set containing the parameter estimates that are used to score observations?

proc score data=dataset1 score=dataset2 out=dataset3 type=parms;

var Performance; run;

a. the DATA= option b. the SCORE= option c. the OUT= option

10.

According to these parameter estimates, are any of the variables in the model statistically significant in predicting or explaining the percentage of body fat?

Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept 1 -20.98714 5.55433 -3.78 0.0002 Age 1 0.01226 0.02836 0.43 0.6658 Hip 1 -0.40163 0.09994 -4.02 <.0001 Abdomen 1 0.86123 0.06814 12.64 <.0001 a. no b. yes, Age

c. yes, Hip and Abdomen d. yes, Age, Hip, and Abdomen

(15)

What output does this program produce?

proc reg data=statdata.bodyfat2 plots(only)=(cp); model PctBodyFat2 = Age Weight Height

Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist / selection=cp best=15;

run; quit;

a. only models that meet both Mallows' and Hocking's Cp criteria for model selection b. the best 15 models that meet the criteria for the forward, backward, and stepwise

selection methods

c. a set of the best 15 candidate models according to the Cp statistic generated using the all-possible regressions technique

d. can't tell from the information given

a. yes b. no

2. Given this SAS output, is there sufficient evidence to reject the hypothesis of equal means?

Quiz, Lesson 3: Regression

Select the best answer for each question. When you are finished, click Submit Quiz.

1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?

Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0

Performance RunTime Age

Performance 1.00000 -0.82049 <.0001 -0.71257 <.0001 RunTime -0.82049 <.0001 1.00000 0.19523 0.2926 Age -0.71257 <.0001 0.19523 0.2926 1.00000

(16)

2.

a. a fairly strong, positive linear relationship b. a fairly strong, negative linear relationship c. a fairly weak, positive linear relationship d. a fairly weak, negative linear relationship

3. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse,

and Maximum_Pulse, the parameter estimate for Age is -2.78. What does this mean? a. For each year older, the predicted value of oxygen consumption is 2.78 greater. b. For each year older, the predicted value of oxygen consumption is 2.78 lower. c. For every 2.78 years older, oxygen consumption doubles.

d. For every 2.78 years younger, oxygen consumption doubles.

4. Given the information in this summary of variable selection, which stepwise selection method was specified in the PROC REG step?

Step Variable Entered Variable Removed Number Vars In Partial R-Square Model R-Square C(p) F Value Pr > F 1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001 2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222 5. a. FORWARD b. BACKWARD c. STEPWISE

d. can't tell from the information given

6. 7.

(17)

8. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent variable when Performance equals 55, what are the lower and upper prediction limits for this newly sampled individual value?

Output Statistics

Obs Name Performance Dependent

Variable Predicted Value Std Error Mean Predict 95% CL Mean 95% CL Predict Residual 1 Jack 48 40.8400 44.9026 1.0190 42.0732 47.7319 37.4190 52.3861 -4.0626 2 Annie 43 45.1200 45.3793 1.3081 41.7475 49.0112 37.5570 53.2016 -0.2593 3 Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021 0.5149 4 Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700 0.4146 5 Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977 0.6610 6 Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009 2.7314 9. a. 44.7500 and 44.2351 b. 40.1023 and 48.3678 c. 36.1680 and 52.3021

d. can't tell from the information given

10. 11.

12. Which of the following statements describes a positive linear relationship between two variables?

1. The more I eat, the less I want to exercise

2. The more salty snacks I eat, the more water I want to drink. 3. No matter how much I exercise, I still weigh the same.

a. 1 only b. 1 and 2 c. 2 only d. 2 and 3 e. 3 only 13. 14.

15. What output does this program produce?

16. proc corr data=statdata.bodyfat2 nosimple 17. plots=matrix(nvar=all histogram); 18. var Age Weight Height;

(18)

19. run;

a. individual correlation plots and simple descriptive statistics b. a scatter plot matrix only, with histograms along its diagonal

c. a table of correlations and a scatter plot matrix with histograms along its diagonal d. can't tell from the information given

20. 21.

22. How many of the following models meet Mallows' Cp criterion for model selection?

Model Index

Number in

Model C(p) R-Square Variables in Model

1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist

2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist

3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist

4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist

5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist

a. 0 b. 1 c. 3 d. 5

23. In this PROC SCORE step, which option specifies the data set containing the parameter estimates that are used to score observations?

24. proc score data=dataset1 score=dataset2 25. out=dataset3 type=parms;

26. var Performance; 27. run;

a. the DATA= option b. the SCORE= option

(19)

c. the OUT= option

28. 29.

30. According to these parameter estimates, are any of the variables in the model statistically significant in predicting or explaining the percentage of body fat?

Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept 1 -20.98714 5.55433 -3.78 0.0002 Age 1 0.01226 0.02836 0.43 0.6658 Hip 1 -0.40163 0.09994 -4.02 <.0001 Abdomen 1 0.86123 0.06814 12.64 <.0001 a. no b. yes, Age

c. yes, Hip and Abdomen d. yes, Age, Hip, and Abdomen

What output does this program produce?

proc reg data=statdata.bodyfat2 plots(only)=(cp); model PctBodyFat2 = Age Weight Height

Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist 31. / selection=cp best=15; 32. run;

33. quit;

a. only models that meet both Mallows' and Hocking's Cp criteria for model selection b. the best 15 models that meet the criteria for the forward, backward, and stepwise

selection methods

c. a set of the best 15 candidate models according to the Cp statistic generated using the all-possible regressions technique

d. can't tell from the information given

(20)

a. yes b. no

3. The manufacturer of a cereal company uses two different processes to package boxes of cereal. He wants to be sure the two processes are putting the same amount of cereal in each box. He plans to perform a two-sample t-test to determine whether the mean weight of cereal is significantly different between the two processes. What type of test should he run?

a. an upper-tailed t-test b. a two-sided t-test c. a lower-tailed t-test

4. 5.

6. Which of the following is not an assumption you make when including a blocking factor in an ANOVA randomized block design?

a. The treatments are randomly assigned within each block. b. The errors are normally distributed.

c. The effects of the treatment factor are constant across the levels of the blocking variable.

d. The observations are dependent.

7. 8.

9. When you perform ANOVA for a randomized block design, where do you indicate your blocking variable to SAS?

a. PROC GLM statement b. MODEL statement only

c. CLASS statement and MODEL statement d. LSMEANS statement

10. 11.

(21)

next step?

a. Remove it from the MODEL statement and re-run the analysis. b. Test an interaction term.

c. Report the F-value and plan a new study.

13. 14.

15. The Dunnett method compares all possible pairs of means, so it can only be used when you make pairwise comparisons.

a. true b. false

16. 17.

18. You can examine Levene's Test for Homogeneity to more formally test which of the following assumptions?

a. the assumption of errors being normally distributed b. the assumption of independent observations c. the assumption of equal variances

d. the assumption of treatments being randomly assigned

19. 20.

21. When you perform a two-way ANOVA in SAS, which of the following statements correctly defines the model that includes the interaction between the two main effect variables?

a. class Drug*Disease; b. class Drug=Disease; c. model Drug*Disease;

d. model Health=Drug Disease Drug*Disease;

22. 23.

24. This table shows output from a post hoc pairwise comparison in which you tested the significance of a drug on patients' health for three different diseases. What conclusion can you make based on this output?

(22)

a. The drug effect is significant when used in patients with disease Z.

b. The drug effect is significant when used in patients with diseases Y and Z. c. The drug effect is not significant when used in patients with disease Z.

Quiz Feedback, Lesson 2: Analysis of Variance (ANOVA)

Your Score:

90%

Congratulations! Your score of 90% indicates that you've mastered the topics in this lesson. If you'd like, check the feedback below and select Review links for any sections containing topics you want to review.

(23)

When you're ready to start the next lesson, select the lesson in the Course Menu. Then click a topic in the Lesson Menu to begin.

1. Given this SAS output, is there sufficient evidence to reject the assumption of equal variances?

a. yes b. no

Your answer: b Correct answer: b

The p-value of 0.2942 is greater than 0.05, so you fail to reject the null hypothesis and conclude that the variances are equal.

Review: The GLM Procedure

2. Given this SAS output, is there sufficient evidence to reject the hypothesis of equal means?

(24)

b. no

Your answer: a Correct answer: a

The p-value of <.001 is less than 0.05, so you would reject the null hypothesis and conclude that the means between the two groups are significantly different.

Review: Examining the Equal Variance t-Test and p-Values

3. The manufacturer of a cereal company uses two different processes to package boxes of cereal. He wants to be sure the two processes are putting the same amount of cereal in each box. He plans to perform a two-sample t-test to determine whether the mean weight of cereal is significantly different between the two processes. What type of test should he run?

a. an upper-tailed t-test b. a two-sided t-test c. a lower-tailed t-test 4. Your answer: x Correct answer: b

5. Because the cereal manufacturer is interested in determining whether the two processes produce a different mean cereal weight, he needs to perform a two-sided t-test.

6. Review: Scenario: Comparing Group Means, Scenario: Testing for Differences on One Side 7.

4. Which of the following is not an assumption you make when including a blocking factor in an ANOVA randomized block design?

a. The treatments are randomly assigned within each block. b. The errors are normally distributed.

c. The effects of the treatment factor are constant across the levels of the blocking variable.

d. The observations are dependent. 5.

Your answer: d Correct answer: d

6. In an ANOVA model, you assume that the errors are normally distributed for each treatment, the errors have equal variances across treatments, and the observations are independent. When you add a blocking factor to your ANOVA model, you also assume that the treatments are randomly assigned within each block and that the effects of the treatment are the same within each block.

(25)

8.

5. When you perform ANOVA for a randomized block design, where do you indicate your blocking variable to SAS?

a. PROC GLM statement b. MODEL statement only

c. CLASS statement and MODEL statement d. LSMEANS statement

6.

Your answer: c Correct answer: c

7. You list the blocking variable in the CLASS statement. You also also specify the variables as indicated in the ANOVA model, so you list the blocking variable in the MODEL statement. 8. Review: Performing ANOVA with Blocking

9.

6. If your blocking variable has a very small F-value in the ANOVA report, what would be a valid next step?

a. Remove it from the MODEL statement and re-run the analysis. b. Test an interaction term.

c. Report the F-value and plan a new study. 7.

Your answer: c Correct answer: c

8. If the F-value for the blocking variable is small, that is, if it's less than 1, then adding the blocking factor was not helpful in your analysis. Because you collect the data based on the blocking factor, your only choice is to report the F-value and plan a new study. The blocking factor must be included in all ANOVA models that you calculate with the sample that you've already collected.

9. Review: Performing ANOVA with Blocking

10.

7. The Dunnett method compares all possible pairs of means, so it can only be used when you make pairwise comparisons.

a. true b. false 8.

Your answer: b Correct answer: b

(26)

all possible pairs of means, so they can only be used when you make pairwise comparisons. The Dunnett method compares all categories to a control group.

10. Review: Dunnett's Multiple Comparison Method, Tukey's Multiple Comparison Method 11.

8. You can examine Levene's Test for Homogeneity to more formally test which of the following assumptions?

a. the assumption of errors being normally distributed b. the assumption of independent observations c. the assumption of equal variances

d. the assumption of treatments being randomly assigned 9.

Your answer: c Correct answer: c

10. You use Levene's Test for Homogeneity in PROC GLM to verify the assumption of equal variances in a one-way ANOVA model.

11. Review: The GLM Procedure

12.

9. When you perform a two-way ANOVA in SAS, which of the following statements correctly defines the model that includes the interaction between the two main effect variables?

a. class Drug*Disease; b. class Drug=Disease; c. model Drug*Disease;

d. model Health=Drug Disease Drug*Disease; 10.

Your answer: d Correct answer: d

11. In the MODEL statement, you first specify the main effect variables as they exist in the two-way ANOVA model. You then define the interaction term by separating the two main effect variables with an asterisk in the MODEL statement.

12. Review: Performing Two-Way ANOVA with Interactions, Applying the Two-Way ANOVA Model

13.

10. This table shows output from a post hoc pairwise comparison in which you tested the

significance of a drug on patients' health for three different diseases. What conclusion can you make based on this output?

(27)

a. The drug effect is significant when used in patients with disease Z.

b. The drug effect is significant when used in patients with diseases Y and Z. c. The drug effect is not significant when used in patients with disease Z.

Your answer: c Correct answer: c

The p-value for disease Z is 0.7815. Because this p-value is greater than your alpha of 0.05, you fail to reject the null hypothesis and conclude that there is no significant effect

of Drug onHealth for patients with disease Z.

Review: Performing a Post Hoc Pairwise Comparison

Quiz Feedback, Lesson 3: Regression

Your Score:

60%

Your score of 60% indicates that you would benefit from reviewing topics in this lesson. Check the feedback below and

select Review links for questions you missed. When you're ready, take the quiz again.

1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?

(28)

Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0

Performance RunTime Age

Performance 1.00000 -0.82049 <.0001 -0.71257 <.0001 RunTime -0.82049 <.0001 1.00000 0.19523 0.2926 Age -0.71257 <.0001 0.19523 0.2926 1.00000 2.

a. a fairly strong, positive linear relationship b. a fairly strong, negative linear relationship c. a fairly weak, positive linear relationship d. a fairly weak, negative linear relationship 3.

Your answer: b Correct answer: b

4. The correlation coefficient for the relationship between Performance and RunTime is -0.82049, which is negative. It is also close to 1, making it a relatively strong relationship. 5. Review: Using Correlation to Measure Relationships between Continuous Variables

6.

2. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse,

andMaximum_Pulse, the parameter estimate for Age is -2.78. What does this mean? a. For each year older, the predicted value of oxygen consumption is 2.78 greater. b. For each year older, the predicted value of oxygen consumption is 2.78 lower. c. For every 2.78 years older, oxygen consumption doubles.

d. For every 2.78 years younger, oxygen consumption doubles. 3.

Your answer: d Correct answer: b

4. The parameter estimate for Age is the average change in Oxygen_Consumption for a 1-unit change in Age. In this case, the parameter estimate is negative, So, for each year older (a 1-unit change in Age), oxygen consumption decreases by 2.78 1-units.

5. Review: The Simple Linear Regression Model

6.

3. Given the information in this summary of variable selection, which stepwise selection method was specified in the PROC REG step?

(29)

Step Variable Entered Variable Removed Number Vars In Partial R-Square Model R-Square C(p) F Value Pr > F 1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001 2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222 4. a. FORWARD b. BACKWARD c. STEPWISE

d. can't tell from the information given 5.

Your answer: d Correct answer: c

6. The summary table contains both Variable Entered and Variable Removed columns. Of the three types of stepwise selection (forward, backward, and stepwise), only stepwise selection can both enter and remove variables. Therefore, STEPWISE must have been specified in the PROC REG step.

7. Review: The Stepwise Selection Approach to Model Building, Specifying Stepwise Selection Methods in SAS, The REG Procedure: Performing Stepwise Regression

8.

4. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent variable when Performance equals 55, what are the lower and upper prediction limits for this newly sampled individual value?

Output Statistics Ob s Nam e Performanc e Dependen t Variable Predicte d Value Std Error Mean Predic t 95% CL Mean 95% CL Predict Residua l 1 Jack 48 40.8400 44.9026 1.0190 42.073 2 47.731 9 37.419 0 52.386 1 -4.0626 2 Annie 43 45.1200 45.3793 1.3081 41.747 5 49.011 2 37.557 0 53.201 6 -0.2593 3 Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021 0.5149 4 Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700 0.4146 5 Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977 0.6610 6 Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009 2.7314 5.

(30)

a. 44.7500 and 44.2351 b. 40.1023 and 48.3678 c. 36.1680 and 52.3021

d. can't tell from the information given 6.

Your answer: d Correct answer: c

7. The CLI option, which displays the 95% CL Predict column in the Output Statistics table, produces confidence limits for an individual predicted value. In this table, the third

observation, for Kate, contains the value 55 for Performance. Therefore, the values in her 95% CL Predict column are the lower and upper confidence limits for a new individual value at the same value of Performance. In contrast, the CLM option displays the values in the 95% CL Mean column, which are the lower and upper confidence limits for a mean predicted value for each observation.

8. Review: Specifying Confidence and Prediction Intervals in SAS, Viewing and Printing Confidence Intervals and Prediction Intervals, The REG Procedure: Producing Predicted Values

9.

5. Which of the following statements describes a positive linear relationship between two variables?

1. The more I eat, the less I want to exercise

2. The more salty snacks I eat, the more water I want to drink. 3. No matter how much I exercise, I still weigh the same. a. 1 only b. 1 and 2 c. 2 only d. 2 and 3 e. 3 only 6. Your answer: c Correct answer: c

7. In statement 2, the amount of salty snacks eaten and thirst have a positive linear relationship. As the values of one variable (amount of salty snacks eaten) increase, the values of the other variable (thirst) increase as well.

8. Review: Using Scatter Plots to Describe Relationships between Continuous Variables, Using Correlation to Measure Relationships between Continuous Variables

9.

6. What output does this program produce?

7. proc corr data=statdata.bodyfat2 nosimple 8. plots=matrix(nvar=all histogram); 9. var Age Weight Height;

(31)

a. individual correlation plots and simple descriptive statistics b. a scatter plot matrix only, with histograms along its diagonal

c. a table of correlations and a scatter plot matrix with histograms along its diagonal d. can't tell from the information given

11.

Your answer: b Correct answer: c

12. By default, PROC CORR produces a table of correlations (which can be a correlation matrix, depending on your program). The NOSIMPLE option suppresses printing of the simple descriptive statistics for each variable, and PLOT=MATRIX requests a scatter plot matrix instead of individual scatter plots. The HISTOGRAM option displays histograms of the variables in the VAR statement along the diagonal of the scatter plot matrix.

13. Review: Producing a Correlation Matrix and a Scatter Plot Matrix

14.

7. How many of the following models meet Mallows' Cp criterion for model selection?

Model Index

Number in

Model C(p) R-Square Variables in Model

1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist

2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist

3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist

4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist

5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist

a. 0 b. 1 c. 3 d. 5 Your answer: d Correct answer: d

In Mallows' Cp criterion, p equals the number of variables in the model plus 1 for the intercept. Therefore, for these models, p equals 8, 9, or 10, depending on the number of terms in the model. All the C(p) values are less than their respective p values, so all five models meet Mallows' Cpcriterion.

(32)

Review: Evaluating Models Using Mallows' Cp Statistic, Viewing Mallows' Cp Statistic in PROC REG, The REG Procedure: Using the All-Possible Regressions Technique, The REG Procedure: Using Automatic Model Selection

8. In this PROC SCORE step, which option specifies the data set containing the parameter estimates that are used to score observations?

9. proc score data=dataset1 score=dataset2 10. out=dataset3 type=parms;

11. var Performance; 12. run;

a. the DATA= option b. the SCORE= option c. the OUT= option 13.

Your answer: b Correct answer: b

14. The SCORE= option specifies the data set that contains the parameter estimates. PROC SCORE reads the parameter estimates from this data set, scores the observations in the data set that the DATA= option specifies, and writes the scored observations to the data set that the OUT= option specifies.

15. Review: The SCORE Procedure: Scoring Predicted Values Using Parameter Estimates

16.

9. According to these parameter estimates, are any of the variables in the model statistically significant in predicting or explaining the percentage of body fat?

Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept 1 -20.98714 5.55433 -3.78 0.0002 Age 1 0.01226 0.02836 0.43 0.6658 Hip 1 -0.40163 0.09994 -4.02 <.0001 Abdomen 1 0.86123 0.06814 12.64 <.0001 a. no b. yes, Age

c. yes, Hip and Abdomen d. yes, Age, Hip, and Abdomen 11.

Your answer: c Correct answer: c

12. Hip and Abdomen both have p-values lower than .05, so they are statistically significant in predicting or explaining the variability of the percentage of body fat.

13. Review: Performing Simple Linear Regression, Analysis versus Prediction in Multiple Regression,The REG Procedure: Performing Multiple Linear Regression

(33)

14.

10. What output does this program produce?

11. proc reg data=statdata.bodyfat2 plots(only)=(cp); 12. model PctBodyFat2 = Age Weight Height

13. Neck Chest Abdomen Hip Thigh 14. Knee Ankle Biceps Forearm Wrist 15. / selection=cp best=15;

16. run; 17. quit;

a. only models that meet both Mallows' and Hocking's Cp criteria for model selection b. the best 15 models that meet the criteria for the forward, backward, and stepwise

selection methods

c. a set of the best 15 candidate models according to the Cp statistic generated using the all-possible regressions technique

d. can't tell from the information given 18.

Your answer: c Correct answer: c

19. When you use the all-possible regressions technique, you specify RSQUARE, ADJRSQ, or CP in the SELECTION= option to rank models. The BEST= option selects the specified number of best models based on the SELECTION= statistic. If more than one statistic is specified, the first statistic listed determines the ranking.

Quiz, Lesson 3: Regression

Select the best answer for each question. When you are finished, click Submit Quiz.

1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?

Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0

Performance RunTime Age

Performance 1.00000 -0.82049 <.0001 -0.71257 <.0001 RunTime -0.82049 <.0001 1.00000 0.19523 0.2926

(34)

Age -0.71257 <.0001 0.19523 0.2926 1.00000 2.

a. a fairly strong, positive linear relationship b. a fairly strong, negative linear relationship c. a fairly weak, positive linear relationship d. a fairly weak, negative linear relationship

3. 4.

5. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse,

and Maximum_Pulse, the parameter estimate for Age is -2.78. What does this mean? a. For each year older, the predicted value of oxygen consumption is 2.78 greater. b. For each year older, the predicted value of oxygen consumption is 2.78 lower. c. For every 2.78 years older, oxygen consumption doubles.

d. For every 2.78 years younger, oxygen consumption doubles.

6. 7.

8. Given the information in this summary of variable selection, which stepwise selection method was specified in the PROC REG step?

Step Variable Entered Variable Removed Number Vars In Partial R-Square Model R-Square C(p) F Value Pr > F 1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001 2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222 9. a. FORWARD b. BACKWARD c. STEPWISE

d. can't tell from the information given

10. 11.

12. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent variable when Performance equals 55, what are the lower and upper prediction limits for this

(35)

newly sampled individual value?

Output Statistics

Obs Name Performance Dependent

Variable Predicted Value Std Error Mean Predict 95% CL Mean 95% CL Predict Residual 1 Jack 48 40.8400 44.9026 1.0190 42.0732 47.7319 37.4190 52.3861 -4.0626 2 Annie 43 45.1200 45.3793 1.3081 41.7475 49.0112 37.5570 53.2016 -0.2593 3 Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021 0.5149 4 Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700 0.4146 5 Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977 0.6610 6 Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009 2.7314 13. a. 44.7500 and 44.2351 b. 40.1023 and 48.3678 c. 36.1680 and 52.3021

d. can't tell from the information given

14. 15.

16. Which of the following statements describes a positive linear relationship between two variables?

1. The more I eat, the less I want to exercise

2. The more salty snacks I eat, the more water I want to drink. 3. No matter how much I exercise, I still weigh the same.

a. 1 only b. 1 and 2 c. 2 only d. 2 and 3 e. 3 only 17. 18.

19. What output does this program produce?

20. proc corr data=statdata.bodyfat2 nosimple 21. plots=matrix(nvar=all histogram); 22. var Age Weight Height;

(36)

a. individual correlation plots and simple descriptive statistics b. a scatter plot matrix only, with histograms along its diagonal

c. a table of correlations and a scatter plot matrix with histograms along its diagonal d. can't tell from the information given

24. 25.

26. How many of the following models meet Mallows' Cp criterion for model selection?

Model Index

Number in

Model C(p) R-Square Variables in Model

1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist

2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist

3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist

4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist

5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist

a. 0 b. 1 c. 3 d. 5

27. In this PROC SCORE step, which option specifies the data set containing the parameter estimates that are used to score observations?

28. proc score data=dataset1 score=dataset2 29. out=dataset3 type=parms;

30. var Performance; 31. run;

a. the DATA= option b. the SCORE= option c. the OUT= option

(37)

32.

33. According to these parameter estimates, are any of the variables in the model statistically significant in predicting or explaining the percentage of body fat?

Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept 1 -20.98714 5.55433 -3.78 0.0002 Age 1 0.01226 0.02836 0.43 0.6658 Hip 1 -0.40163 0.09994 -4.02 <.0001 Abdomen 1 0.86123 0.06814 12.64 <.0001 34. a. no b. yes, Age

c. yes, Hip and Abdomen d. yes, Age, Hip, and Abdomen

35.

What output does this program produce?

proc reg data=statdata.bodyfat2 plots(only)=(cp); 36. model PctBodyFat2 = Age Weight Height

37. Neck Chest Abdomen Hip Thigh 38. Knee Ankle Biceps Forearm Wrist 39. / selection=cp best=15;

40. run; 41. quit;

a. only models that meet both Mallows' and Hocking's Cp criteria for model selection b. the best 15 models that meet the criteria for the forward, backward, and stepwise

selection methods

c. a set of the best 15 candidate models according to the Cp statistic generated using the all-possible regressions technique

d. can't tell from the information given

(38)

References

Related documents

The user de- fined FORMAT mapping is independent of a SAS DATASET and variables and must be explicitly as- signed in a subsequent DATASTEP and/or PROC.. PROC FORMAT can be viewed as

Therefore, access to the architects drawings and new building fabric specifications became available and were used to create an up to date building model to investigate the

Experimental evidence reveals that there is a strong willingness to trust and to act in both positively and negatively reciprocal ways. So far it is rarely analyzed whether

When the income level of the individual is low, the individual faces low tax price and their demand for increased state spending for job-training programs as an

During drought or excess moisture and flood conditions, the Livestock Tax Deferral Program can help designated areas manage

Transaction costs which individual investors would have to bear to enter into and main- tain some relations with other entities on the market (e.g., collecting

Carriage Run BEDROOM 2 Second Floor Elevation C opt door BEDROOM 4 MASTER BEDROOM walk-in closet walk-in closet linen w ardrobe w ardrobe opt skylight MASTER BATH

Thus, to begin to explore the association between incubation environment and social cognition, this study tested the impact that incubation environment has on gaze following into