CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

(1)

CORRELATIONAL ANALYSIS: PEARSON’S r Purpose of correlational analysis

The purpose of performing a correlational analysis:

 To discover whether there is a relationship between variables,

 To find out the direction of the relationship – whether it is positive, negative or zero,

 To find the strength of the relationship between the two variables.

The test statistics, called the correlation coefficient r,

measures the strength of the relationship between the

(2)

Direction of the Relationship Positive

High scores on one variable tend to be associated with high scores on the other variable:

Example

Study hours X Exam marks Negative

High scores on one variable are associated with low scores on the other variable:

Example

Age of drivers X Car accidents

Young male drivers are more likely to have accidents.

(3)

Perfect positive

Brother’s age X Your age Imperfect positive

IQ X Exam marks Perfect Negative

Number of chocolate bars in a vending machine X Amount of money put in the machine

Imperfect Negative

Attendance at football matches X Amount of rainfall

(4)

The Strength or Magnitude of the Relationship (Minus or plus)

1.0 Perfect 0.9 – 0.7 Strong 0.6 – 0.4 Moderate 0.3 – 0.1 Weak

0.0 Zero (none)

(5)

A Sample of Correlation Coefficients

Scholastic Aptitude Test Scores and Height of Student + 0.05 Scholastic Aptitude Test Scores and Grade Point Average + 0.38 Adult Vocabulary and Math Ability + 0.59 IQ scores of identical twins reared together + 0.86 Grade Point Average and

How Close to Instructor Student Sits + 0.35 Satisfaction with job and

amount of reported stress on the Job - 0.27 Number of cigarettes smoked per day and

Amount of job stress - 0.01

(6)

Relationship Among Variables

In every science the ideal is to find out some kind of cause and effect relationship. This is a relationship in which change in one variable causes change in another.

Example: Studying for an exam (cause) results in a high grade (effect).

The variable that causes the change (in this case, studying) is called the independent variable. The variable that changes (the exam grade) is called the dependent variable.

Why is linking variables in terms of cause and effect

important? Because this kind of relationship allows us to

predict how one kind of behavior will produce another.

(7)

It is wrong to think that a cause and effect relationship present whenever variables change together.

Example 1: The marrige rate in England falls to its lowest

point in January, exactly the same month when the death

rate reaches its highest point. This hardly means that people

die because they fail to mary (or that they don’t mary

because they die). In fact, it is the bad wheather during

January that causes both a low marrige rate and high death

rate.

(8)

Correlation is a measure of relationship between two (or more) variables that change together.

Sometimes the relationship between two (or more) variables seems to be connected to some other variable. Such a connection is called a spurious correlation. This is a false relationship and needs to be unmasked. Unmasking a correlation as spurious is assisted by a technique called control of relevant variables.

Variables other than the independent variable that can exert

an effect on dependent variable are called relevant

variables.

(9)

Relationship Between Net Profits and Cash Flow ($ mil.) Corporation Net Profits Cash Flow

1 83 126

2 89 191

3 176 267

4 82 137

5 413 807

6 18 35

7 337 426

8 146 380

9 173 327

10 247 356

(10)

Correlation Matrix

Assets Cash Flow N.Empl. Market Val. Net Profits Sales

($ mil.) ($ mil.) (thousands) ($mil.) ($ mil.) ($

mil.)

Assets 1.00

Cash Flow .34 1.00

Employed .39 .82 1.00

Market Val. .36 .94 .81 1.00

Net Profits .27 .95 .75 .91 1.00

Sales .59 .80 .88 .75 .73

1.00

(11)

X Y

Ice-Cream Cones Temperature Sold .

26 100

22 95

19 87

20 89

19 88

21 90

17 56

16 55

12 40

(12)

Variance Explanation of the Correlation Coefficient

The correlation coefficient (r) is a ratio between the covariance (variance shared by the two variables) and a measure of the seperate variances.

Let’s take an example of father’s IQ and child’s IQ. These two variables are positively associated (correlated): the more of father’s IQ, the higher the child’s IQ.

When the two variables are correlated, we say that they

‘share’ variance. Father’s and child’s IQ share a lot of

variance. How much variance do they share? A correlation

coefficient will give us the answer: By squaring the

(13)

If you have a correlation of r = 0.80, you have accounted for (explained) 64 percent of the variance. This is called coefficient of determination.

If we use a Venn diagram, the overlap between the two variables is the proportion of their common or shared variance. If 64 % is shared variance, then 36 % is not shared: it is what is known as unique variance: dividing 36 by 2, 18 % is unique to father and 18 % is unique to child.

The shaded part (overlap) on the Venn diagram (64 %) is the

variance the two variables (father’s and child’s IQ scores). In

other words, 64 % of the variation in child’s IQ score can be

explained by the variation in father’s IQ scores. 36 % is

(14)

REGRESSION ANALYSIS

The purpose of linear regression

Psychologists are interested in using linear regression in

order to discover the effect of one variable (which we denote x) on another (which we denote y).

Correlational analysis allows us to conclude how strongly two variables relate to each other (both magnitude and

direction);

Linear regression analysis answers the question ‘How much will y change, if x changes?’

In other words: If x changes by a certain amount, we will be

able to estimate how much y will change.

(15)

A simple correlational analysis will show us that the father’s IQ and child’s IQ scores are positively correlated: in this case, we are able to say that as the father’s IQ increases, so does the child’s IQ. But we cannot tell the amount of increase in child’s IQ, for any given amount of increase in father’s IQ.

Psychologists use linear regression in order to be able to asses the effect that x has on y. Linear regression analysis results in a formula ( a regression equation) that we can use to predict exactly how y will change, as a result of change in x.

Since linear regression gives us a measure of the effect that

x has on y, the techniques allows us to predict y, from x.

(16)

The Regression Line

Correlational analysis gives us a measure that represents how closely the datapoints (on a scatter diagram) are clustered around an (imaginary) line.

In linear regression analysis we fit a real straight line to the datapoints and by using the functional equation of this line we predict a y value (a child’s IQ score) by looking at an x value (father’s IQ score).

This line drawn in the best place possible; that is, no other line

would fit as well. This is why it is called the line of best fit.

(17)

SPEARMAN’S RHO ( )

Pearson r is a parametric measure of correlation coefficiant.

In many research situations we cannot use parametric tests because our data do not meet the assumptions underlying their use.

Remember from the discussion about parametric vs nanparametric tests. These assumptions were, requirement of:

• independence

• normality

• equal varances

• at least an interval scale

• having a reasonable sample size.

rs

Spearman Rho without tied ranks

(18)

Nonparamatric tests make no assumptions about the data and you can safely use the tests to analyse data when you think you might not be able to meet the assumptions for parametric tests.

Spearman’s rho is a nonparametric measure of correlation coefficient.

Spearman’s rho is used when your data does not conform to the assumptions of a parametric test. Say, for instance, one or more variables are ratings given by participants (e.g. Attractiveness of a person), or to put pictures in rank order of preference. In these cases, data might not be normally distributed.

rs

(19)

POINT BISERIAL CORRELATION

Point biserial correlation provides a measure of relation between a continuous variable, such as scores on a test, and a two-categoried, or dichotomous, variable, such as