CORRELATIONAL ANALYSIS: PEARSON’S r Purpose of correlational analysis
The purpose of performing a correlational analysis:
To discover whether there is a relationship between variables,
To find out the direction of the relationship – whether it is positive, negative or zero,
To find the strength of the relationship between the two variables.
The test statistics, called the correlation coefficient r,
measures the strength of the relationship between the
Direction of the Relationship Positive
High scores on one variable tend to be associated with high scores on the other variable:
Example
Study hours X Exam marks Negative
High scores on one variable are associated with low scores on the other variable:
Example
Age of drivers X Car accidents
Young male drivers are more likely to have accidents.
Perfect positive
Brother’s age X Your age Imperfect positive
IQ X Exam marks Perfect Negative
Number of chocolate bars in a vending machine X Amount of money put in the machine
Imperfect Negative
Attendance at football matches X Amount of rainfall
The Strength or Magnitude of the Relationship (Minus or plus)
1.0 Perfect 0.9 – 0.7 Strong 0.6 – 0.4 Moderate 0.3 – 0.1 Weak
0.0 Zero (none)
A Sample of Correlation Coefficients
Scholastic Aptitude Test Scores and Height of Student + 0.05 Scholastic Aptitude Test Scores and Grade Point Average + 0.38 Adult Vocabulary and Math Ability + 0.59 IQ scores of identical twins reared together + 0.86 Grade Point Average and
How Close to Instructor Student Sits + 0.35 Satisfaction with job and
amount of reported stress on the Job - 0.27 Number of cigarettes smoked per day and
Amount of job stress - 0.01
Relationship Among Variables
In every science the ideal is to find out some kind of cause and effect relationship. This is a relationship in which change in one variable causes change in another.
Example: Studying for an exam (cause) results in a high grade (effect).
The variable that causes the change (in this case, studying) is called the independent variable. The variable that changes (the exam grade) is called the dependent variable.
Why is linking variables in terms of cause and effect
important? Because this kind of relationship allows us to
predict how one kind of behavior will produce another.
It is wrong to think that a cause and effect relationship present whenever variables change together.
Example 1: The marrige rate in England falls to its lowest
point in January, exactly the same month when the death
rate reaches its highest point. This hardly means that people
die because they fail to mary (or that they don’t mary
because they die). In fact, it is the bad wheather during
January that causes both a low marrige rate and high death
rate.
Correlation is a measure of relationship between two (or more) variables that change together.
Sometimes the relationship between two (or more) variables seems to be connected to some other variable. Such a connection is called a spurious correlation. This is a false relationship and needs to be unmasked. Unmasking a correlation as spurious is assisted by a technique called control of relevant variables.
Variables other than the independent variable that can exert
an effect on dependent variable are called relevant
variables.
Relationship Between Net Profits and Cash Flow ($ mil.) Corporation Net Profits Cash Flow
1 83 126
2 89 191
3 176 267
4 82 137
5 413 807
6 18 35
7 337 426
8 146 380
9 173 327
10 247 356
Correlation Matrix
Assets Cash Flow N.Empl. Market Val. Net Profits Sales
($ mil.) ($ mil.) (thousands) ($mil.) ($ mil.) ($
mil.)
Assets 1.00
Cash Flow .34 1.00
Employed .39 .82 1.00
Market Val. .36 .94 .81 1.00
Net Profits .27 .95 .75 .91 1.00
Sales .59 .80 .88 .75 .73
1.00
X Y
Ice-Cream Cones Temperature Sold .
26 100
22 95
19 87
20 89
19 88
21 90
17 56
16 55
12 40
Variance Explanation of the Correlation Coefficient
The correlation coefficient (r) is a ratio between the covariance (variance shared by the two variables) and a measure of the seperate variances.
Let’s take an example of father’s IQ and child’s IQ. These two variables are positively associated (correlated): the more of father’s IQ, the higher the child’s IQ.
When the two variables are correlated, we say that they
‘share’ variance. Father’s and child’s IQ share a lot of
variance. How much variance do they share? A correlation
coefficient will give us the answer: By squaring the
If you have a correlation of r = 0.80, you have accounted for (explained) 64 percent of the variance. This is called coefficient of determination.
If we use a Venn diagram, the overlap between the two variables is the proportion of their common or shared variance. If 64 % is shared variance, then 36 % is not shared: it is what is known as unique variance: dividing 36 by 2, 18 % is unique to father and 18 % is unique to child.
The shaded part (overlap) on the Venn diagram (64 %) is the
variance the two variables (father’s and child’s IQ scores). In
other words, 64 % of the variation in child’s IQ score can be
explained by the variation in father’s IQ scores. 36 % is
REGRESSION ANALYSIS
The purpose of linear regression
Psychologists are interested in using linear regression in
order to discover the effect of one variable (which we denote x) on another (which we denote y).
Correlational analysis allows us to conclude how strongly two variables relate to each other (both magnitude and
direction);
Linear regression analysis answers the question ‘How much will y change, if x changes?’
In other words: If x changes by a certain amount, we will be
able to estimate how much y will change.
A simple correlational analysis will show us that the father’s IQ and child’s IQ scores are positively correlated: in this case, we are able to say that as the father’s IQ increases, so does the child’s IQ. But we cannot tell the amount of increase in child’s IQ, for any given amount of increase in father’s IQ.
Psychologists use linear regression in order to be able to asses the effect that x has on y. Linear regression analysis results in a formula ( a regression equation) that we can use to predict exactly how y will change, as a result of change in x.
Since linear regression gives us a measure of the effect that
x has on y, the techniques allows us to predict y, from x.
The Regression Line
Correlational analysis gives us a measure that represents how closely the datapoints (on a scatter diagram) are clustered around an (imaginary) line.
In linear regression analysis we fit a real straight line to the datapoints and by using the functional equation of this line we predict a y value (a child’s IQ score) by looking at an x value (father’s IQ score).
This line drawn in the best place possible; that is, no other line
would fit as well. This is why it is called the line of best fit.
SPEARMAN’S RHO ( )
Pearson r is a parametric measure of correlation coefficiant.
In many research situations we cannot use parametric tests because our data do not meet the assumptions underlying their use.
Remember from the discussion about parametric vs nanparametric tests. These assumptions were, requirement of:
• independence
• normality
• equal varances
• at least an interval scale
• having a reasonable sample size.
rs
Spearman Rho without tied ranks
Nonparamatric tests make no assumptions about the data and you can safely use the tests to analyse data when you think you might not be able to meet the assumptions for parametric tests.
Spearman’s rho is a nonparametric measure of correlation coefficient.
Spearman’s rho is used when your data does not conform to the assumptions of a parametric test. Say, for instance, one or more variables are ratings given by participants (e.g. Attractiveness of a person), or to put pictures in rank order of preference. In these cases, data might not be normally distributed.
rs