Chapter II. Relationships Between two or more Variables with SPSS.doc

(1)

Crosstabulation Tables

A technique for organizing data by groups, categories, or classes, thus facilitating comparisons; a joint frequency distribution of observations on two or more sets of variables.

Cross-tabs display “the joint distribution of values of the dependent and independent variables by listing the categories for one variable along one side and the categories for the other variable across the top. Each case is then placed in the cell of the table that represents the combination of values that corresponds to its scores on the variables”

Contingency table. The results of a cross-tabulation of two variables, such as survey questions Classifies data on two dimensions

 Rows classify according to first dimension

 Columns classify according to a second dimension Percentages

 One way to investigate relationships is to compute row and column percentages

 Compute row percentages by dividing each cell’s frequency by its row total and expressing as a percentage

 Compute column percentages by dividing by the column total

Types of Variables

 Can use two qualitative variables

 Can use a quantitative variable versus a qualitative variable or two quantitative variables  With quantitative variables, often define categories

Graphs and Tables for Two Variables (Bivariate Data) Two Categorical Variables:

 Contingency Table (also called cross-classification table or two-way table)  Side-by-Side Bar Chart

Two Numerical Variables:  Time Series Plot  Scatter Diagram

Contingency table. The results of a cross-tabulation of two variables, such as survey questions.

A multicolumn table that present the count or percentage of response for two categorical variables. In a two way table, the categories of one of the variables form the rows of the table, while the categories of the second variable form the columns. Statisticians use two-way tables and segmented bar (Stacked Bar) charts to examine the relationship between two categorical variables.

Entries in the cells of a two-way table can be displayed as frequency counts or as relative frequencies (just like a one-way table); or they can be displayed graphically as a segmented bar chart.

Two-Way Frequency Tables: Example

Table 4.1. Preferences for leisure activities in adults by gender

Gender Dance Sports TV Total

(2)

Interpretation: To the right, the two-way table shows the favorite leisure activities for 50 adults; 20 men and 30 women. Because entries in the table are frequency counts, the table is a frequency table.

If we looked only at the marginal frequencies in the Total column, we might conclude that the three activities had equal appeal. Yet, the joint frequencies show a strong preference for dance among women; and little interest in dance among men.

Entries in the "Total" row and "Total" column are called marginal frequencies or the marginal distribution. Entries in the body of the table are called joint frequencies.

Relative Frequency of Table

We can also display relative frequencies in two-way tables. Tables below show preferences for leisure activities in the form of relative frequencies. The relative frequencies in the body of the table are called conditional frequencies or the conditional distribution.

Two-way tables can show relative frequencies for the whole table, for rows, or for columns. The table on the left shows relative frequencies for rows; and the table on the right shows relative frequencies for columns.

Row percentage preferences for leisure activities Colum percentage preferences for leisure activities In adults by gender in adults by gender

Dance% Sports% TV% Total% Dance% Sports% TV% Total%

Men 10 50 40 100 Men 11 62 50 40

Women 53 20 27 100 Women 89 38 50 60

Total 36 32 32 100 Total 100 100 100 100

Each type of relative frequency table makes a different contribution to understanding the relationship between gender and preferences for leisure activities. For example, "Relative Frequency for Rows" table most clearly shows the probability that each gender will prefer a particular leisure activity. For instance, it is easy to see that the probability that a man will prefer dance is 10%; the probability that a woman will prefer dance is 53%; the probability that a man will prefer sports is 50%; and so on.

Clustered Bar Charts

Figure 2._{1.Preferences for leisure activities in adults by gender}

%

Segmented Bar or Stacked Bar

Such relationships are often easier to detect when they are displayed graphically in a segmented bar chart. A segmented bar chart has one bar for each level of a categorical variable. Each bar is divided into "segments", such that the length of each segment indicates proportion or percentage of observations in a second variable.

(3)

Figure 2.2. Preferences for leisure activities in adults by gender

Interpretation: Each inner cell represents the count or percentage of a pairing, or cross-classifying, of categories from each variable.

The graphs show that women have a strong preference for dance; while men seldom make dance their first choice. Men are most likely to prefer sports, but the degree of preference for sports over TV is not great.

Example

A public opinion survey explored the relationship between age and support for increasing the minimum wage. The results are summarized in the two-way table which can see bellow.

Table 2.2 Opinion for support for increasing the minimum wage by age

Age For Against

No

opinion Total

21 - 40 25 20 5 50

41 - 60 20 35 20 75

Over 60 55 15 5 75

Total 100 70 30 200

In the “21 to 40” age group, what percentage supports increasing the minimum wage?

A total of 50 people in the “21 to 40” age group were surveyed. Of those, 25 were for increasing the minimum wage. Thus, half of the respondents in the “21 to 40” age group (50%) supported increasing the minimum wage.

The Scattergrams

Used to study relationships between two variables. Scattergrams are used for bivariate numerical data.  Bivariate data consists of paired observations taken from two numerical variables

 Place one variable on the x-axis  Place a second variable on the y-axis  Place dot on pair coordinates

Types of Relationships

 Linear: A straight line relationship between the two variables

 Positive: When one variable goes up, the other variable goes up  Negative: When one variable goes up, the other variable goes down

(4)

scores on the first exam also obtain low scores on the second exam? In other words, are scores on the first exam predictive of how students will perform on the second exam? Are the scores co-related to one another—hence, the term correlation.

Features scattergrams

Depending on how the cloud of points we can obtain the following information:

• To determine whether there is a direct or inverse relationship between the variables. • Whether the relationship is strong or weak.

• Determine if the ratio is set to a linear model or a different mathematical model (e.g. curvilinear model, etc).

Figure 2. 3: Different scatter plots and their respective regression models for them

Example

A study was conducted to find whether there is any relationship between the mortality rate and percentage of the immunization in some countries of the world. The following set of data was found in the page "http:// www.unicef.org/statistics/". Let us determine the coefficient of correlation for this set of data. The first column represents the countries and the second and third columns represent the % of immunization and mortality rate of each country.

Table 2.3: Mortality rate and % immunization by some countries in the world

Country % Immunization Rate_mortality

Bolivia 77 118

Brasil 69 65

Cambodia 32 184

Canadá 85 8

China 94 43

Czech_Republic 99 12

Egypt 89 55

Ethiopia 13 208

Finland 95 7

France 95 9

Greece 54 9

India 89 124

Italy 95 10

Japan 87 6

México 91 33

Poland 98 16

Russian_federation 73 32

Senegal 47 145

Turkey 76 87

(5)

Steps in SPSS version 22 for draw Scatter diagram

Graphs>Chart builder>OK>front the variable box, take the variable immunization to “x-axis” and Rate_mortality to “y-axis” and click in Group Point ID> take the variable country to the Point ID>OK

Figure 2.4: Scatter diagram of the mortality rate by % immunization with regression line inserted in some countries in the world

It is always a good idea to start with a visual inspection of the data, since it will give you a hint about patterns and relationships and also about potential problems such as outliers. A visual inspection of the scatter plot shows a clear trend to higher immunization lower rate of mortality, i.e. a negative correlation between the two variables.

(6)

Correlation.

Correlation is a way of describing how two variables are related to one another simultaneously. Therefore, we will need to discuss the nature of bivariate distributions where we look at two variables simultaneously. We will discuss the use of z-scores as a starting point in understanding correlation. We will learn about the Pearson Product-Moment Correlation coefficient, how to compute it, and how to interpret it. Finally, we will look at correlation as a step in making predictions.

There are many times when we are interested in how two variables relate to one another. For example, are scores on the first examination in a course related to the scores on the second or subsequent examinations? Do students perform consistently on the course?

Types of Correlation

There are two important types of correlation. They are (1) Positive or Negative correlation and (2) Linear and Non Linear correlation.

Positive and Negative Correlation

If the values of the two variables deviate in the same direction i.e. if an increase (or decrease) in the values of one variable results, on an average, in a corresponding increase (or decrease) in the values of the other variable the correlation is said to be positive.

Some examples of series of positive correlation are:

 You want to know if a relationship exists between school achievement and attendance. You collect the grade point average (GPA) and days present during the school year.

 Heights and weights.

 Amount of rainfall and yield of crops.

 Ice cream sales and temperature (as temperature goes up, ice cream sales go up).

Correlation between two variables is said to be negative or inverse if the variables deviate in opposite direction. That is, if increase (or decrease) in the values of one variable results on an average, in corresponding decrease (or increase) in the values of other variable.

Some examples of series of negative correlation are:

 TV viewing and class grades-students who spend more time watching TV tend to have lower grades (or phrased as students with higher grades tend to spend less time watching TV).

 Imagine that you are conducting research on school performance. You want to know if a relationship exists between high school students' performance in school and video games. You collect the grade point average (GPA) and the weekly hours spent playing video games

Linear correlation coefficient (Pearson r)

To quantify the degree of linear relationship between two variables we use the correlation coefficient of Pearson (r). The 'r' is given by the following formula:

, also

r = coefficient of correlation n= number of pair of the data ∑x= the summation of the x score ∑y= the summation of the y score

∑xy= the summation of the crossproducts of the scores ∑x2_{= the summation of the squared scores on x}

∑y2_{= the summation of the squared scores on y}

(7)

values are between +/- 0.20 and +/- 0.60 would be moderate, and values greater than +/- 0.60 would be strong. Nevertheless, that these labels are arbitrary guidelines and will not be appropriate or useful in all possible research situation.

A Caveat

It must, however, be considered that there may be a third variable related to both of the variables being investigated, which is responsible for the apparent correlation. Correlation does not imply causation. Also, a nonlinear relationship may exist between two variables that would be inadequately described, or possibly even undetected, by the correlation coefficient.

Continue example from table 4.3; you are interested in showing the relationship between the mortality rate and the percentage of immunization with the countries under study.

Steps in SPSS for correlation

To obtain a quantitative measure of the degree of association between the two variables, Select Correlate Bivariate from the Analyze pull-down menu. In the dialogue box, move the two variables % immunization and rate mortality into the Variables frame. Make sure that Pearson is selected in the Correlation Coefficients frame and that Two-tailed is selected in the Test of Significance frame. Click OK. Take a look at the output table: What is the correlation between the two variables? Is it significant?

(8)

Output of SPSS

Figure 2.6: Output correlation from SPSS

% Immunization Mortality rate %

Immunization

Pearson

Correlation 1 -.791**

Sig. (2-tailed) .000

N 20 20

Mortality rate Pearson

Correlation -.791** ₁

Sig. (2-tailed) .000

N 20 20

**. Correlation is significant at the 0.01 level (2-tailed).

Interpretation. The correlation output from SPSS is shown in above table, where we can see that the Pearson correlation coefficient for the two variables is r = -.791 **, this value indicates a strong, negative linear relationship between the variables. Furthermore, we see that it is high significant, with p < 0.001

Hypothesis test of correlation

Ho: (there is no association between mortality rate and the % of immunization by country) Ha: (There is an association between them)

Decision: We reject the Null Hypothesis given a significance of (Sig. .000)

Conclusion: r = - .791** There is a high inverse correlation between mortality rate and the % of immunization, i.e., higher % of immunization, the mortality rate was significantly decreased (Sig = .000).

Coefficient of Determination (Goodness of Fit) How well does this line fit the data?

Goodness of fit is measured by:

Coefficient of determination r2 _{= (r)}2_{x 100; and express your interpretation with percentage.}

From the example above, the correlation coefficient r was (-0.791) so we have r2 _{= (-0.791)}2_{x 100 = 62.6%} fit. This high value indicates that any predictions made about “y” from a value of “x” will be good.

The 'goodness of fit' indicates the rate of the variation in mortality rate which is accounted for by the variation of the % of immunization, in other hands 63% of the variance in mortality rate is explained by the % of immunization.

Regression Analysis

If two variables are significantly correlated, and if there is some theoretical basis for doing so, it is possible to predict values of one variable from the other. This observation leads to a very important concept known as ‘Regression Analysis’.

Regression Analysis, in general sense, means the estimation or prediction of the unknown value of one variable from the known value of the other variable. It is one of the most important statistical tools which are extensively used in almost all Sciences – Natural, Social and Physical. It is specially used in education and Psychology to study the relationship between two or more variables that are related.

(9)

Linear function

It is calleda linear function ofone variable,a function of the form:

Models of regression

 Linear

 Quadratic

 Cubic

 Logarithmic

 Inverse

 Power

 Compound

 Logistics

 Exponential

 Multiple linear

Simple linear Regression Model

Equation of Simple linear Regression Model

Suppose we have a sample of size ‘n’ and it has two sets of measures, denoted by ‘x’ and ‘y’. We can predict the values of ‘y’ given the values of ‘x’ by using the equation, called the REGRESSION EQUATION.

Where

Y= dependent variable

X= independent variable (predictor variable or explanatory variable) = Intercept (value of “Y” when “X”=0)

(10)

The coefficients and are given by:

,

Assumptions of the Regression Model (errors) 1. The errors are independent

2. Normality: The Error term has a normal probability distribution

3. Homoscedasticity: The errors have a constant variance. The variation around the regression line is constant for all values of x, no matter the value taken is high or low, in any case the variation is supposed to be the same.

4. Values are not correlated

Continue example 1:

We want to show the relationship between mortality and the percentage of immunization with the countries studied. The data are given in Table 2.3 of this chapter. Is there a significant linear relationship between two variables? Calculate the regression line of mortality based on the percentage of immunization. How much increases or decrease the mortality rate for each 1% of immunization? What rate of mortality could be predicted for the group of countries with 80% immunization?

Steps in SPSS for Regression. Analyze >Regression Linear> Figure 2.7. Steps for Regression Analysis

(11)

Figure 2.8: Output correlation from SPSS Model Summaryb

Model R R Square

Adjusted R Square

Std. Error of the

Estimate Durbin-Watson

1 .791a _.626 _.605 _40.139 _2.679

a. Predictors: (Constant), % Immunization

b. Dependent Variable: Mortality rate

Interpretation: The model summary table reports the strength of the relationship between the model and the dependent variable. “R=.791”, correlation coefficient, is the linear correlation between the observed and model-predicted values of the dependent variable. Its large value indicates a strong relationship.

R Square = .626, the coefficient of determination, is the squared value correlation coefficient. It shows that about 62.6% the variation in mortality is explained by the model.

2. Checking the ANOVA of Regression

Figure 2.9: Output regression from SPSS. The ANOVA table tests the acceptability of the model from a statistical perspective.

ANOVAb

Model

Sum of

Squares df Mean Square F Sig.

1 Regression 48497.050 1 48497.050 30.101 .000a

Residual 29000.950 18 1611.164

Total 77498.000 19

a. Predictors: (Constant), % Immunization

b. Dependent Variable: Mortality rate

The Regression row displays information about the variation accounted for by your model.

The Residual row displays information about the variation that is not accounted for by your model.

The significance value of the F statistic is less than 0.05, which means that the variation explained by the model is not due to chance. In other words the p_value or Sig. associated with this F value is very small (.000). These values are used to answer the question “Do the independent variables reliably predict the dependent variable?

While the ANOVA table is a useful test of the model's ability to explain any variation in the dependent variable, it does not directly address the strength of that relationship. (r, and r2_{given the strength of the} relationship).

3. Checking the coefficients of the regression line (parameter estimates) This table shows the coefficients of the regression line:

(12)

 The second, these are the values for the regression equation for predicting the dependent variable from the independent variable.

 The regression equation can be presented in many different ways, for example:

Mortality predicted= 224.316 - 2.136* % of immunization

Figure 2.10: Output coefficients of regression from SPSS

Model

Unstandardized Coefficients Standardized Coefficients

t Sig. B Std. Error Beta

1 (Constant) 224.316 31.440 7.135 .000

% Immunization -2.136 .389 -.791 -5.486 .000

a. Dependent Variable: Mortality rate

= 224.316 average mortality rate without any influence of the % of immunization (constant source). = - 2.136 decreased mortality rate for each % of immunization as indicated nonzero correlation (slope of the line)

Prediction of Mortality Rate

What rate of mortality could be predicted for the group of countries with 80% immunization?

The best estimate of the mortality is obtained by substituting the value of 80% for that of the independent variable, x, and calculating the corresponding value of the Mortality.

Estimated Mortality:

Expected mortality would be 53 mortality rate.

In terms of hypothesis testing. Ho: = 0

Ha:

(13)

Making a decision and interpret the result: Sig.= .000 for % of immunization is less than .05, therefore reject null hypothesis. And we conclude to a confidence level of 95% that of the dependent variable values depend on the values of the independent variable, i.e. that the mortality rate depends on % immunization.

With these results we conclude:

1 º. The variables are associated or related linearly in the population from which the sample comes (with a very small chance that the relationship found is explained by chance, less than one per thousand).

2 °. Found that the relationship is very good (r = - .791), in fact that the independent variable (% of immunization) explained 62.6% ( ) the variability of the dependent variable (mortality).

3 º. That the relationship is inverse or negative, decreasing in average mortality rate 2,136 per % increase in immunization in the countries under study.

Basic assumptions of correlation and regression analysis

Normal distribution of residuals

A residual is the difference between the observed and model-predicted values of the dependent variable. The residual for a given product is the observed value of the error term for that product. A histogram or P-P plot of the residuals will help you to check the assumption of normality of the error term.

Checking the Normality of the Error Term

Figure 2.11: Output for assumption of correlation and regression analysis from SPSS

The shape of the histogram should approximately follow the shape of the normal curve. This histogram is acceptably close to the normal curve.

(14)

Checking Independence of the Error Term

We use the graph of standardized residuals

against estimates

typified. If the variance of the residuals is constant, the cloud of points would be concentrated in a band centered at zero and parallel to the x-axis. We note that there is no consistent pattern clearly defined in the data and the residuals fluctuate randomly around the line corresponding to the average of the same and "0" There is, otherwise, good scatter.

Homoscedasticity:

As for equality of variances, the chart above serves to test this assumption. If the variability of the waste along the predicted values is more or less constant, as is the case, we can conclude that if it satisfies the equality of variances. Not otherwise.

On the other hands we do not see any unusual patterns in this plot except the large negative outlier when the mortality rate is approximately 1; also we observe that the most countries are concentrated between -1 to 0

Checking Values are not correlated (Mullticolinearity)

Multicollinearity exists when independent variables in a regression equation are highly correlated among themselves

(15)

Figure 2.12.: Output of mullticolinearity by Durbin Watson from SPSS

Model Summaryb

Model R

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

1 .791a _0.626 _0.605 _40.139 _2.679

a. Predictors: (Constant), % Immunization; b. Dependent Variable: Mortality rate

Spearman Correlation Coefficient Technique  Measures Correlation Between Ranks

 Corresponds to Pearson Product Moment Correlation Coefficient  Values Range from -1 to +1

The technique is appropriate when the degree of association between two sets of ranks (pertaining to two variables) is to be examined.

• Illustrative research question(s) this technique can answer

Is there a significant relationship between Quality of Life and Percentage of New Residents

Procedure

1. Assign Ranks, Ri , to the Observations of Each Variable Separately 2. Calculate Differences, di , Between Each Pair of Ranks

3. Square Differences, di 2, Between Ranks 4. Sum Squared Differences for Each Variable 5. Use Shortcut Approximation Formula

di = the difference between the ith sample unit's ranks on the two variables n = the total sample size

Example: Quality of life

Five cities have been rated on an index that measures the quality of life. Also, the percentage of the population that has moved into each city over the past year has been determined. Have cities with higher quality of life scores attracted more new residents?

Table 2.4. Association between Quality of Life and Percentage of New Residents

City

Quality of life

Percentage of New Residents

A 25 14

B 10 3

(16)

E 20 15

F 12 4

G 17 12

H 28 16

I 35 20

J 42 25

Figure 2.13. Steps in SPSS for Spearman Correlation

Analyze > Correlation > Bivariate > Pass the two variables > check on Spearman > OK

Figure 2.14.: Output of Spearman correlation from SPSS

Correlations

Quality of life % of new residents Spearman's rho Quality of life Correlation Coefficient 1.000 .952**

Sig. (2-tailed) . .000

N 10 10

% of new residents

Correlation Coefficient .952** _1.000

Sig. (2-tailed) .000 .

N 10 10

**. Correlation is significant at the 0.01 level (2-tailed).

Interpretation from outcome of SPSS

(17)

Note: Spearman’s rho is an index of the strength of association between the variables: it ranges from 0 (no association) to +/- 1.00 (perfect association). A perfect positive association (rs = +1.00) would exist if there were o disagreements in ranks between the two variables. A perfect negative relationship (rs = -1.00) would exist if the ranks were imperfect disagreement.

For testing Spearman’s Rho for significance, when the number of cases in the sample is 10 or more, the sampling distribution of Spearman’s rho approximates the t distribution, and we will use this distribution to conduct the test.

Example: Industrial Marketing Firm

An industrial marketing firm has been hiring all its salespeople from among the graduates of 10 business schools in the vicinity of its headquarters

The firm developed a subjective ranking of the perceived prestige levels of the 10 schools and the performance levels of the groups of graduates recruited from these schools

Question

What is the degree of association between the prestige levels of the schools and the sales performance levels of their graduates hired by this company?

Table 2.5. Association between Business School and Ranking of school's Prestige

Business Scholl Ranking of school's Prestige Ranking of performance of School's Graduates Difference Between Ranks Squared Difference

1 10 8 2 4

2 7 3 4 16

3 9 7 2 4

4 1 2 -1 1

5 6 9 -3 9

6 2 4 -2 4

7 3 5 -2 4

8 8 10 -2 4

9 5 6 -1 1

10 4 1 3 9

Sum= 56

Hypotheses

Spearman's rho=.661* Sig. (2-tailed) = .038

• Decision Rule:

– Since Sig. < 0.05 (Sig.=.038), we “Reject H0, and conclude that there is a true association between the prestige of business schools and the job performance of its graduates. In other words, the sample correlation of rs= 0.661 is unlikely to have occurred because of chance.

Figure 2.15. Output of Spearman correlation from SPSS

Correlations Ranking of school's Prestige Ranking of performance of School's Graduates

(18)

Sig. (2-tailed) . .038

N 10 10

Ranking of

performance of

School's Graduates

Correlation

Coefficient .661

* _1.000

Sig. (2-tailed) .038 .

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Review problems of chapter

Short answers

1. When studying the simultaneous response to two categorical variables, you should construct a: a. Histogram

b. Pie chart c. Scatter plot

d. Cross-classification table

2. In a cross-classification table, the number of rows and columns: a. Must always be the same

b. Must always be 2 c. Must add to 100% d. None of the above

Answer True or False:

3. A professor wants to study the relationship between the number of hours a student studied for an exam and the exam score achieved. The professor can use a scatterdiagrams_________

4. A professor wants to study the relationship between the number of hours a student studied for an exam and the exam score achieved. The professor can use a bar chart. _________

Fill the blank:

5. To evaluate two categorical variables at the same time ______________________________________ should be developed.

6. A _______________________________ chart should be used when you are studying a pattern between two numerical variables.

Problems

7. The children in a school are to have extra swimming lessons if they cannot swim. The table below gives information about the children in Years 7, 8 and 9.

Can swim Cannotswim

year 7 120 60

Year 8 168 11

Year 9 172 3

(19)

Construct Cluster Bar Chart and interpret

8. The children in a class conducted a survey to find out how many children had videos at home and how many had computers at home. Their results are given in the table below.

Video No video

Computer 8 2

No computer 20 3

Construct a Segmented or Stacked bar and interpret.

9. The following table show the number of students enrolled in three business majors for two different years at one small private university.

Student enrollment in three Business Majors, 2000 and 2005

Major 2000 2005

Finance 160 250

Marketing 140 200

Accounting 100 150

Construct Stacked bar and Cluster bar and compare both graph Short answers

10. The slope (B1) represents: a. Predict value of y when x=0 b. Change in Y per unit change in X c. Predict value of Y

d. Variation around the regression line

11. The Y intercept (B0) represents the: a. Predict value of y when x=0 b. Change in Y per unit change in X c. Predict value of Y

d. Variation around the regression line

12. The coefficient of determination (r2_{) tells you:}

a. That the coefficient of correlation (r) is larger than 1 b. Whether the slope has any significance

c. Whether the regression sum of squares is greater than the total sum of squares d. The proportion of total variation that is explained

13. In performing a regression analysis involving two numerical variables, you assume: a. The variance of X and Y are equal

b. The variation around the line of regression is the same for each X value c. That X and Y are independent

d. All of the above

14. The residuals represent:

How many children did not have a video at home? And what is the percentage?

How many children had a computer at home? And what is the percentage? How many children had no home computer neither video? And what is the percentage?

(20)

c. The square root of the slope

d. The predicted value of y when X = 0

15. If the coefficient of correlation (r) = -1.00, then:

a. All the data points must fall exactly on a straight line with a slope that equal 1.00 b. All the data points must fall exactly on a straight line with a negative slope. c. All the data points must fall exactly on a straight line with a positive slope

d. All the data points must fall exactly on a horizontal straight line with a zero slope.

16. Assuming a straight line (linear) relationship between X and Y, if the coefficient of correlation (r) = -0.30:

a. There is no correlation b. The slope is negative

c. Variable X is larger than variable Y d. The variance of x is negative

17. In a simple linear regression model, the coefficient of correlation and the slope: a. May have opposite signs

b. Must have the same sign c. Must have opposite signs d. Are equal

18. Why does voter turnout vary from election to election? For municipal election in the five different cities, information has been gathered on the percent of registered voters who actually voted, unemployment rate, average years of education for the city, and the percentage of all political ads that used “negative campaigning” (personal attacks, negative portrayals of the opponent’s record, etc.).

For each relationship:

a. Draw and interpret a scatter gram and a freehand regression line b. Compute and interpret the coefficient of correlation

c. Compute and interpret the coefficient of determination d. Compute the line of regression

e. Predict the voter turnout for a city in which the unemployment rate was 12, a city in which the average years of schooling was 11, and an election in which 90% of the advertising were negative. f. Assume these cities are a random sample and conduct a test of significance for each relationship. g. Describe the strength and the direction of the relationships in a sentence or two. Which (if any)

relationships were significant? Which factor had the strongest effect on turnout?

TURNOUT AND EMPLOYMENT Answer: r = .950, r2 =90.3%,

City Turnout Unemployment rate ,

A 55 5

B 60 8

C 65 9

D 72 12

E 68 9

F 70 10

TURNOUT AND LEVEL OF EDUCATION

City Turnout Average years of school Answer: r = .941, r2 =88.6%,

A 55 11.9 ,

B 60 12.1

C 65 12.7

(21)

E 68 12.9

F 70 13.0

TURNOUT AND NEGATIVE CAMPAIGNING

City Turnout

% of Negative

Advertisements Answer: r =-.694, r2 =48.2%,

A 55 60 ,

B 60 63

C 65 55

D 72 53

E 68 60

F 70 48

19. For eleven cities, data has been gathered on total crime rate (major felonies per 100,000 populations) and the percentage of people who are new immigrants (arrived in Rwanda within the past five years). Are the variables related?

City

Total Crime

Rate

Percent Immigrant

s

A 1500 10

B 1200 8

C 2000 6

D 1700 11

E 1600 15

F 1000 16

G 1700 9