1
SIMPLE LINEAR CORRELATION 1. Introduction
What is measure of correlation?
Correlation refers to the linear relationship among the variables. For example, the blood pressure of a patient may be correlated with age, food habits and family history and so on. If we study the degree of relationship among the above variables, it is known as correlation.
In this module, we are going to discuss various types of correlation and different methods of calculating correlation. The estimation of correlation depends of upon the type of data. If the data type is in actual values, we can calculate Karl pearson correlation co efficient. If the data is somewhat qualitative in nature , we have to calculate rank correlation.
2. Objectives
1. To study about various types of correlation 2. To study the methods of estimating correlation 3. Types of correlation
Simple or Partial or Multiple:
If the relationship between two variables are analysed, it is called simple correlation. When more than two variables are considered, the correlation between two of them when all other variables are hold constant, i.e., when the linear effects of all other variables on them are removed, is called partial correlation. When more than two variables are considered, the correlation between one of them and its estimate based on the group consisting of the other variables is called multiple correlation.
Linear or Non-linear or No Correlation:
2
When the points are scattered neither around a line nor around a curve, there is no correlation between the two variables. The following diagrams show these three kinds.
Methods:
The following four methods are available under simple linear correlation and among them, product moment method is the best one.
i) Scatter Diagram
ii) Karl Pearson’s correlation coefficient or product moment correlation coefficient (r)
iii) Spearman’s rank correlation coefficient (p)
iv) Correlation coefficient by concurrent deviation method (rc)
SCATTER DIAGRAM
When we plot the values of X and Y in a graph sheet, the resulting diagram with N points is called scatter diagram.
Possible types of scatter diagram under simple linear correlation are as given below. From a diagram, it can be found out whether the correlation is positive or negative and whether it is perfect or high or low.
3
The merits of this method are as follows. This is easy to draw, non mathematical and simple to understand. This does not involve computations. The greatest demerit is that this is not quantitative. As no numerical value is computed, comparison is not
4
possible sometimes. Decisions based on this are not as accurate as those based on correlation coefficients.
KARL PEARSON’S COEFFICIENT OF CORRELATION (r)
This is also called product moment correlation coefficient. This is denoted by r. This is covariance between the two variables divided by the product of their standard deviations. This can be calculated by using any one of the formulae. Choice of a formula depends on the nature of the data. Different formulae are seen under the following examples.
Example 1: Calculate correlation between the following variables X and Y
X 57 58 59 59 60 61 62 64
Y 67 68 65 68 72 72 69 71
Solution:
X-X̅ and Y-Y̅ are integers and small and hence the following formula is used. x = (X-X
̅) = 0 and y = (Y-Y̅) = 0 are the properties. Karl Pearson’s correlation coefficient,
𝑥𝑦
r = where x = 0 and y = 0 √x2√y2
Steps:
1. Find out X-X̅ and Y-Y̅ and x =(X-X̅) 𝑎𝑛𝑑 y = (Y-Y̅) 2. Calculate x2 𝑎𝑛𝑑y2
5 X Y x= X-𝐗̅ 𝐗̅ = 60 y = Y-𝐘̅ 𝐘̅ = 69 xy x 2 y2 57 58 59 59 60 61 62 64 67 68 65 68 72 72 69 71 -3 -2 -1 -1 0 1 2 4 -2 -1 -4 -1 3 3 0 2 6 2 4 1 0 1 4 16 9 4 1 1 0 1 4 16 4 1 16 1 9 9 0 4
X=480 Y=552 x=0 y=0 xy=24 x2=36 y2=44 𝐗̅ = 𝐗 𝐍 = 𝟒𝟖𝟎 𝟖 = 60; 𝐘̅ = 𝐘 𝐍 = 𝟓𝟓𝟐 𝟖 = 69 𝑥𝑦 r = where x = 0 and y = 0 √x2√y2 24 = = 0.6030 √36√44
Example 2: Compute the coefficient of correlation between X – Advertisement
Expenditure and Y – Sales.
X 10 12 18 8 13 20 22 15 5 17
Y 88 90 94 86 87 92 96 94 88 85
Solution:
Method 1: Values of X and Y are assumed to be small and the following formula is
attempted instead of the one used in the previous example. NXY – (X) (Y)
r =
6 Steps
1. Calculate XY and XY 2. Calculate X2 𝑎𝑛𝑑 Y2 3. Calculate X and Y
4. Substitute the above values in the formula
X Y XY X2 Y2 10 12 18 8 13 20 22 15 5 17 88 90 94 86 87 92 96 94 88 85 880 1080 1692 688 1131 1840 2112 1410 440 1445 100 144 324 64 169 400 484 225 25 289 7744 8100 8836 7396 7569 8464 9216 8836 7744 7225
7
(or)
Method II:
If the values of X and Y are large, the following formula can be used. This method is known as shortcut method or step deviation method or coded method. Any value can be assumed for a, b, c and d. If they are assumed as follows, the resulting values, will be smaller, a = X̅or some other convenient value in between the minimum and the maximum of X values. C is the common difference between X values. When there is no common difference, maximum possible value for c is to be identified such that u are not fractions. Similarly, from the values of Y, b and d are to be decided and v= Y−𝑏
𝑥 are to be calculated. X Y u= 𝐗−𝐚 𝐜 a=15; c=1 v = 𝐘−𝐛 𝐝 b=90; d=1 uv u2 v2 10 12 18 8 13 20 22 15 5 17 88 90 94 86 87 92 96 94 88 85 -5 -3 3 -7 -2 5 7 0 -10 2 -2 0 4 -4 -3 2 6 4 -2 -5 10 0 12 28 6 10 42 0 20 -10 25 9 9 49 4 25 49 0 100 4 4 0 16 16 9 4 36 16 4 25
8 4. Properties:
1. -1≤r≤+1≤. i.e., correlation coefficient cannot be greater than 1 numerically.
2. Correlation coefficient is independent of change of origin. That is why we do not add a or b when we use u and v although we have subtracted them from X and Y while finding u and v.
3. Correlation coefficient is independent of change of scale. That is why we do not multiply by c or d when we use u and v although we have divided X and Y by them while finding u and v.
4. Correlation coefficient is a pure number. It is not in any unit of measurement.
Interpretation of r. r=0 indicates absence of linear correlation. R=+1 and r=-1 indicate
perfect positive and perfect negative correlations respectively. 0<r<0.5 indicates low positive correlation, 0.5≤r≤1 indicates high positive correlation, -1<r≤-0.5 indicates high negative correlation and -0.5<r<0 indicates low negative correlation, according to certain statisticians.
Coefficient of Determination: The square of the coefficient of correlation (r) is the coefficient of determination (r2). It indicates the portion of variation in the dependent variable which is due to the independent variable. The remaining variation in the dependent variable is because of other factors.
If r=0.5, r2=0.25 and so 25% (0.25 x 100) of the variation in the dependent variable is attributable to the independent variable.
5. SPEARMAN’S RANK CORRELATION COEFFICIENT () = 1 - [ 6d2
𝑁(𝑁2−1)] when there is no tie. d-difference between X and Y ranks or
6d2 + m(m2−1) 12
9 6 d2 + m(m2−1) 12 + m(m2−1) 12 +…. = 1 – N(N2-1)
It is calculated when ranks are given or when rank correlation coefficient is required. Rank correlation coefficient also lies between -1 and +1.
Example X 21 36 42 37 25 Y 47 40 37 42 43 Solution: X Y Ranks d d2 X Y 21 36 42 37 25 47 40 37 42 43 5 3 1 2 4 1 4 5 3 2 4 -1 -4 -1 2 16 1 16 1 4 Total -- d=0 d2=38 Note:
1. For the maximum value of X, 42, rank is 1; for the next lower value 37, rank is 2; Similarly, for 47 of Y, rank is 1, 43 rank is 2.
2. Rank 1 may be assigned to the least value of X; rank 2 to the next higher value, … Ifso, the least value of Y is to be assigned rank 1, the next higher value rank 2.
Tied Ranks:
When one or more values are repeated, the two aspects – ranks of the repeated values and change in the formula, are to be considered.
10
Each repeated value is to be considered separately. If a value has occurred m times, for each of them the average of the probable ranks which would have been assigned to them if they had differed slightly is assigned now. This does not affect the ranks of other values.
For each such repeated value, m(m2−1)
12 is to be added with d
2once in the formula,
Example : Find the rank correlation coefficient for the percentage of marks secured by
a group of 8 students in Economics and Statistics.
Marks in Economics 50 60 65 70 75 40 70 80
Marks in Statistics 80 71 60 75 90 82 70 50
Solution: Let X – Marks in Economics
Y – Marks in Statistics X Y Ranks d d2 X Y 50 60 65 70 75 40 70 80 80 71 60 75 90 82 70 50 7 6 5 3.6 2 8 3.5 1 3 5 7 4 1 2 6 8 4 1 -2 -0.5 1 6 -2.5 -7 16 1 4 0.25 1 36 6.25 49 Total -- d=0 d2=113.5 6 d2 + m(m2−1) 12
= 1 – when one value occurs m times N(N2-1)
when m = 2, m(m2−1) 12 = 0.5
11
= 1 - [6 x 114 8 x 63] = 1 – 1.3571
= -0.3571
Example : Marks obtained by 8 students in Accountancy (X) and Statistics (Y) are
given below. Computer rank correlation.
12
6. COEFFICIENT OF CORRELATION BY CONCURRENT DEVIATION METHOD (rc) rc= √2C−N N = 0 = √−2C−N N rc= ± √± (2C−N N )
N denotes the number of entries and C denotes number of + signs (concurrent deviations) in DXY column.
rc also lies between -1 and +1.
If a value is greater than the preceding value, + sign is put. If it is less than the preceding one, - sign is marked. If it is equal to the preceding one, deviation is 0. Dx denotes such deviations among the values of the variable X and DY denotes those of Y. DXY denotes the product of the entries under DX and DY.
Example: Calculate the coefficient of correlation from the data given below by the
method of concurrent deviations.
13 Solution: Index of Imports (X) Index of Prices (Y) Dx Dy DXY 85 82 89 95 104 108 112 100 99 93 90 110 115 112 118 120 109 98 102 103 105 107 - + + + + + - - - - + - + + - - + + + + - - + + - - - - - -
Example Calculate the coefficient of correlation by concurrent deviation method.
Exports (Rs.in crores) 17 12 25 41 32 51 Imports (Rs.in crores) 12 15 23 32 28 26
14 7. CONCLUSION