CORRELATION
ANALYSIS
Introduction
Correlation a LINEAR association between two
random variables
Correlation analysis show us how to determine
both the nature and strength of relationship
between two variables
When variables are dependent on time
correlation is applied
A zero correlation indicates that there is no
relationship between the variables
A correlation of –1 indicates a perfect negative
correlation
A correlation of +1 indicates a perfect positive
correlation
Types of Correlation
There are three types of correlation
Types
Type1
Positive
Negative
No
Perfect
If two related variables are such that when
one increases (decreases), the other also
increases (decreases).
If two variables are such that when one
increases (decreases), the other decreases
(increases)
When plotted on a graph it tends to be a perfect
line
When plotted on a graph it is not a straight line
Type 2
Type 3
Simple Multiple
Partial
Two independent and one dependent variable
One dependent and more than one independent
variables
One dependent variable and more than one
independent variable but only one independent
variable is considered and other independent
variables are considered constant
Methods of Studying Correlation
Scatter Diagram Method
Karl Pearson Coefficient Correlation of
Method
180 160 140 120 100 80 60 40 20 0 0 5 0 100 150 2 0 0 2 5 0 Drug A (dose in mg) S ym pt om In de x 160 140 120 100 80 60 40 20 0 0 50 100 150 200 250 Drug B (dose in mg) S y m p to m In d ex
Very good fit Moderate fit
Correlation:
Linear
Relationship
s
Strong relationship = good linear fit
Points clustered closely around a line show a strong correlation. The line is a good predictor (good fit) with the data. The more spread out the points, the weaker the correlation, and the less good the fit. The line is a REGRESSSION line (Y = bX + a)
Coefficient of Correlation
A measure of the strength of the linear relationship
between two variables that is defined in terms of the
(sample) covariance of the variables divided by their
(sample) standard deviations
Represented by “r”
r lies between +1 to -1
-1 <
r
< +1
The + and – signs are used for positive linear
correlations and negative linear
n Y
2
(
Y
)
2
X
2
n
(
X
)
2
n XY
X Y
r
xy
Shared variability of X and Y variables on the
top
Individual variability of X and Y variables on the
bottom
Interpreting Correlation
Coefficient
r
strong correlation:
r > .70 or r < –.70
moderate correlation:
r is between .30 &
.70
or
r is between –.30
and –.70
weak correlation:
r is between 0 and
Spearmans rank coefficient
A method to determine correlation when the data
is not available in numerical form and as an
alternative the method, the method of rank
correlation is used. Thus when the values of the
two variables are converted to their ranks, and
there from the correlation is obtained, the
Computation of Rank
Correlation
Spearman’s rank correlation coefficient
ρ can be calculated when
Actual ranks given
Ranks are not given but grades are given but not
repeated
Ranks are not given and grades are given and
repeated
Algebraically method 1.Least Square Method-:
The regression equation of X on Y is : X= a+bX
Where,
X=Dependent variable and Y=Independent variable The regression equation of Y on X is:
Y = a+bX Where,
Y=Dependent variable X=Independent variable
Simple Linear Regression Independent variable (x) De p e n d e n t v a ri a b le (y )
The output of a regression is a function that predicts the dependent variable based upon values of the independent variables.
y = a + bX ± є
a (y intercept)
b = slope = ∆y/ ∆x є
Example1-: From the following data obtain the regression equations
using the method of Least Squares.
X 3 2 7 4 8 Y 6 1 8 5 9 Solution-: X Y XY X2 Y2 3 6 18 9 36 2 1 2 4 1 7 8 56 49 64 4 5 20 16 25 8 9 72 64 81
X 24
Y 29
XY 168
X2 142
Y2 207
Y
na
b
X
2 X b X a XYSubstitution the values from the table we get 29=5a+24b………(i)
168=24a+142b
84=12a+71b………..(ii)
Multiplying equation (i ) by 12 and (ii) by 5 348=60a+288b………(iii)
420=60a+355b………(iv)
By putting the value of a and b in the Regression equation Y on X
we get
Y=0.66+1.07X
Now to find the regression equation of X on Y , The two normal equation are
2 Y b Y a XY Y b na XSubstituting the values in the equations we get 24=5a+29b………(i)
168=29a+207b………..(ii)
Multiplying equation (i)by 29 and in (ii) by 5 we get
Substituting the values of a and b in the Regression equation X and Y
X=0.49+0.74Y
2.Deaviation from the Arithmetic mean method:
The calculation by the least squares method are quit cumbersome when the values of X and Y are large. So the work can be simplified by using this method.
The formula for the calculation of Regression Equations by this method: Regression Equation of X on Y-
(
X
X
)
b
(
Y
Y
)
xy
Regression Equation of Y onX-)
(
)
(
Y
Y
b
yxX
X
xy
Where,b
xy yxb
and = Regression CoefficientExample2-: from the previous data obtain the regression equations by Taking deviations from the actual means of X and Y series.
X 3 2 7 4 8 Y 6 1 8 5 9 X Y x2 y2 xy 3 6 -1.8 0.2 3.24 0.04 -0.36 2 1 -2.8 -4.8 7.84 23.04 13.44 7 8 2.2 2.2 4.84 4.84 4.84 4 5 -0.8 -0.8 0.64 0.64 0.64 8 9 3.2 3.2 10.24 10.24 10.24
X
X
x
y Y Y
X 24
Y 29
x 0
y 0
x2 26.8
y2 38.8
xy28.8 Solution-:Regression Equation of X on Y is
49
.
0
74
.
0
8
.
5
74
.
0
8
.
4
8
.
5
8
.
38
8
.
28
8
.
4
2
Y
X
Y
X
Y
X
y
xy
b
xy Regression Equation of Y on X is ) ( ) (Y Y byx X X
) 8 . 4 ( 07 . 1 8 . 5 8 . 4 8 . 26 8 . 28 8 . 5 2
X Y X Y x xy byx ………….(I))
(
)
(
X
X
b
xyY
Y
It would be observed that these regression equations are same as those obtained by the direct method .
3.Deviation from Assumed mean method-:
When actual mean of X and Y variables are in fractions ,the calculations can be simplified by taking the deviations from the assumed mean.
The Regression Equation of X on Y-:
2 2 y x y x xyd
d
N
d
d
d
d
N
b
The Regression Equation of Y on X-:
2 2 y x y x yxd
d
N
d
d
d
d
N
b
)
(
)
(
X
X
b
xyY
Y
)
(
)
(
Y
Y
b
yxX
X
But , here the values of and will be calculated by following formula:
b
xy yxExample-: From the data given in previous example calculate regression equations by assuming 7 as the mean of X series and 6 as the mean of Y series.
X Y
Dev. From assu. Mean 7
(dx)=X-7
Dev. From assu. Mean 6 (dy)=Y-6 dxdy 3 6 -4 16 0 0 0 2 1 -5 25 -5 25 +25 7 8 0 0 2 4 0 4 5 -3 9 -1 1 +3 8 9 1 1 3 9 +3 Solution-: 2 x
d
2 yd
The Regression Coefficient of X on Y-:
2 2 y y y x y x xyd
d
N
d
d
d
d
N
b
74
.
0
194
144
1
195
11
155
)
1
(
)
39
(
5
)
1
)(
11
(
)
31
(
5
2
xy xy xy xyb
b
b
b
8
.
5
5
29
Y
N
Y
Y
The Regression equation of X on Y-:
)
8
.
5
(
74
.
0
)
8
.
4
(
)
(
)
(
Y
X
Y
Y
b
X
X
xy8
.
4
5
24
X
N
X
X
The Regression coefficient of Y on X-:
2 2 x x y x y x yx d d N d d d d N b07
.
1
134
144
121
255
11
155
)
11
(
)
51
(
5
)
1
)(
11
(
)
31
(
5
2
yx yx yx yxb
b
b
b
The Regression Equation of Y on X-: