• No results found

SIMPLE LINEAR CORRELATION

N/A
N/A
Protected

Academic year: 2021

Share "SIMPLE LINEAR CORRELATION"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

1

SIMPLE LINEAR CORRELATION 1. Introduction

What is measure of correlation?

Correlation refers to the linear relationship among the variables. For example, the blood pressure of a patient may be correlated with age, food habits and family history and so on. If we study the degree of relationship among the above variables, it is known as correlation.

In this module, we are going to discuss various types of correlation and different methods of calculating correlation. The estimation of correlation depends of upon the type of data. If the data type is in actual values, we can calculate Karl pearson correlation co efficient. If the data is somewhat qualitative in nature , we have to calculate rank correlation.

2. Objectives

1. To study about various types of correlation 2. To study the methods of estimating correlation 3. Types of correlation

Simple or Partial or Multiple:

If the relationship between two variables are analysed, it is called simple correlation. When more than two variables are considered, the correlation between two of them when all other variables are hold constant, i.e., when the linear effects of all other variables on them are removed, is called partial correlation. When more than two variables are considered, the correlation between one of them and its estimate based on the group consisting of the other variables is called multiple correlation.

Linear or Non-linear or No Correlation:

(2)

2

When the points are scattered neither around a line nor around a curve, there is no correlation between the two variables. The following diagrams show these three kinds.

Methods:

The following four methods are available under simple linear correlation and among them, product moment method is the best one.

i) Scatter Diagram

ii) Karl Pearson’s correlation coefficient or product moment correlation coefficient (r)

iii) Spearman’s rank correlation coefficient (p)

iv) Correlation coefficient by concurrent deviation method (rc)

SCATTER DIAGRAM

When we plot the values of X and Y in a graph sheet, the resulting diagram with N points is called scatter diagram.

Possible types of scatter diagram under simple linear correlation are as given below. From a diagram, it can be found out whether the correlation is positive or negative and whether it is perfect or high or low.

(3)

3

The merits of this method are as follows. This is easy to draw, non mathematical and simple to understand. This does not involve computations. The greatest demerit is that this is not quantitative. As no numerical value is computed, comparison is not

(4)

4

possible sometimes. Decisions based on this are not as accurate as those based on correlation coefficients.

KARL PEARSON’S COEFFICIENT OF CORRELATION (r)

This is also called product moment correlation coefficient. This is denoted by r. This is covariance between the two variables divided by the product of their standard deviations. This can be calculated by using any one of the formulae. Choice of a formula depends on the nature of the data. Different formulae are seen under the following examples.

Example 1: Calculate correlation between the following variables X and Y

X 57 58 59 59 60 61 62 64

Y 67 68 65 68 72 72 69 71

Solution:

X-X̅ and Y-Y̅ are integers and small and hence the following formula is used. x = (X-X

̅) = 0 and y = (Y-Y̅) = 0 are the properties. Karl Pearson’s correlation coefficient,

𝑥𝑦

r = where x = 0 and y = 0 √x2√y2

Steps:

1. Find out X-X̅ and Y-Y̅ and x =(X-X̅) 𝑎𝑛𝑑 y = (Y-Y̅) 2. Calculate x2 𝑎𝑛𝑑y2

(5)

5 X Y x= X-𝐗̅ 𝐗̅ = 60 y = Y-𝐘̅ 𝐘̅ = 69 xy x 2 y2 57 58 59 59 60 61 62 64 67 68 65 68 72 72 69 71 -3 -2 -1 -1 0 1 2 4 -2 -1 -4 -1 3 3 0 2 6 2 4 1 0 1 4 16 9 4 1 1 0 1 4 16 4 1 16 1 9 9 0 4

X=480 Y=552 x=0 y=0 xy=24 x2=36 y2=44 𝐗̅ = 𝐗 𝐍 = 𝟒𝟖𝟎 𝟖 = 60; 𝐘̅ = 𝐘 𝐍 = 𝟓𝟓𝟐 𝟖 = 69 𝑥𝑦 r = where x = 0 and y = 0 √x2√y2 24 = = 0.6030 √36√44

Example 2: Compute the coefficient of correlation between X – Advertisement

Expenditure and Y – Sales.

X 10 12 18 8 13 20 22 15 5 17

Y 88 90 94 86 87 92 96 94 88 85

Solution:

Method 1: Values of X and Y are assumed to be small and the following formula is

attempted instead of the one used in the previous example. NXY – (X) (Y)

r =

(6)

6 Steps

1. Calculate XY and XY 2. Calculate X2 𝑎𝑛𝑑 Y2 3. Calculate X and Y

4. Substitute the above values in the formula

X Y XY X2 Y2 10 12 18 8 13 20 22 15 5 17 88 90 94 86 87 92 96 94 88 85 880 1080 1692 688 1131 1840 2112 1410 440 1445 100 144 324 64 169 400 484 225 25 289 7744 8100 8836 7396 7569 8464 9216 8836 7744 7225

(7)

7

(or)

Method II:

If the values of X and Y are large, the following formula can be used. This method is known as shortcut method or step deviation method or coded method. Any value can be assumed for a, b, c and d. If they are assumed as follows, the resulting values, will be smaller, a = X̅or some other convenient value in between the minimum and the maximum of X values. C is the common difference between X values. When there is no common difference, maximum possible value for c is to be identified such that u are not fractions. Similarly, from the values of Y, b and d are to be decided and v= Y−𝑏

𝑥 are to be calculated. X Y u= 𝐗−𝐚 𝐜 a=15; c=1 v = 𝐘−𝐛 𝐝 b=90; d=1 uv u2 v2 10 12 18 8 13 20 22 15 5 17 88 90 94 86 87 92 96 94 88 85 -5 -3 3 -7 -2 5 7 0 -10 2 -2 0 4 -4 -3 2 6 4 -2 -5 10 0 12 28 6 10 42 0 20 -10 25 9 9 49 4 25 49 0 100 4 4 0 16 16 9 4 36 16 4 25

(8)

8 4. Properties:

1. -1≤r≤+1≤. i.e., correlation coefficient cannot be greater than 1 numerically.

2. Correlation coefficient is independent of change of origin. That is why we do not add a or b when we use u and v although we have subtracted them from X and Y while finding u and v.

3. Correlation coefficient is independent of change of scale. That is why we do not multiply by c or d when we use u and v although we have divided X and Y by them while finding u and v.

4. Correlation coefficient is a pure number. It is not in any unit of measurement.

Interpretation of r. r=0 indicates absence of linear correlation. R=+1 and r=-1 indicate

perfect positive and perfect negative correlations respectively. 0<r<0.5 indicates low positive correlation, 0.5≤r≤1 indicates high positive correlation, -1<r≤-0.5 indicates high negative correlation and -0.5<r<0 indicates low negative correlation, according to certain statisticians.

Coefficient of Determination: The square of the coefficient of correlation (r) is the coefficient of determination (r2). It indicates the portion of variation in the dependent variable which is due to the independent variable. The remaining variation in the dependent variable is because of other factors.

If r=0.5, r2=0.25 and so 25% (0.25 x 100) of the variation in the dependent variable is attributable to the independent variable.

5. SPEARMAN’S RANK CORRELATION COEFFICIENT ()  = 1 - [ 6d2

𝑁(𝑁2−1)] when there is no tie. d-difference between X and Y ranks or

6d2 + m(m2−1) 12

(9)

9 6 d2 + m(m2−1) 12 + m(m2−1) 12 +…. = 1 – N(N2-1)

It is calculated when ranks are given or when rank correlation coefficient is required. Rank correlation coefficient also lies between -1 and +1.

Example X 21 36 42 37 25 Y 47 40 37 42 43 Solution: X Y Ranks d d2 X Y 21 36 42 37 25 47 40 37 42 43 5 3 1 2 4 1 4 5 3 2 4 -1 -4 -1 2 16 1 16 1 4 Total -- d=0 d2=38 Note:

1. For the maximum value of X, 42, rank is 1; for the next lower value 37, rank is 2; Similarly, for 47 of Y, rank is 1, 43 rank is 2.

2. Rank 1 may be assigned to the least value of X; rank 2 to the next higher value, … Ifso, the least value of Y is to be assigned rank 1, the next higher value rank 2.

Tied Ranks:

When one or more values are repeated, the two aspects – ranks of the repeated values and change in the formula, are to be considered.

(10)

10

Each repeated value is to be considered separately. If a value has occurred m times, for each of them the average of the probable ranks which would have been assigned to them if they had differed slightly is assigned now. This does not affect the ranks of other values.

For each such repeated value, m(m2−1)

12 is to be added with d

2once in the formula,

Example : Find the rank correlation coefficient for the percentage of marks secured by

a group of 8 students in Economics and Statistics.

Marks in Economics 50 60 65 70 75 40 70 80

Marks in Statistics 80 71 60 75 90 82 70 50

Solution: Let X – Marks in Economics

Y – Marks in Statistics X Y Ranks d d2 X Y 50 60 65 70 75 40 70 80 80 71 60 75 90 82 70 50 7 6 5 3.6 2 8 3.5 1 3 5 7 4 1 2 6 8 4 1 -2 -0.5 1 6 -2.5 -7 16 1 4 0.25 1 36 6.25 49 Total -- d=0 d2=113.5 6 d2 + m(m2−1) 12

 = 1 – when one value occurs m times N(N2-1)

when m = 2, m(m2−1) 12 = 0.5

(11)

11

= 1 - [6 x 114 8 x 63] = 1 – 1.3571

= -0.3571

Example : Marks obtained by 8 students in Accountancy (X) and Statistics (Y) are

given below. Computer rank correlation.

(12)

12

6. COEFFICIENT OF CORRELATION BY CONCURRENT DEVIATION METHOD (rc) rc= √2C−N N = 0 = √−2C−N N rc= ± √± (2C−N N )

N denotes the number of entries and C denotes number of + signs (concurrent deviations) in DXY column.

rc also lies between -1 and +1.

If a value is greater than the preceding value, + sign is put. If it is less than the preceding one, - sign is marked. If it is equal to the preceding one, deviation is 0. Dx denotes such deviations among the values of the variable X and DY denotes those of Y. DXY denotes the product of the entries under DX and DY.

Example: Calculate the coefficient of correlation from the data given below by the

method of concurrent deviations.

(13)

13 Solution: Index of Imports (X) Index of Prices (Y) Dx Dy DXY 85 82 89 95 104 108 112 100 99 93 90 110 115 112 118 120 109 98 102 103 105 107 - + + + + + - - - - + - + + - - + + + + - - + + - - - - - -

Example Calculate the coefficient of correlation by concurrent deviation method.

Exports (Rs.in crores) 17 12 25 41 32 51 Imports (Rs.in crores) 12 15 23 32 28 26

(14)

14 7. CONCLUSION

References

Related documents

The overall ratings for the 12 statements derived from the literature on collaborative leadership showed that 34 responses indicated were associated with ‘strongly

The few studies of acculturation and suicidal behavior that have been conducted report a positive relationship between acculturation or acculturation stress and suicidal

Beberapa faktor yang dapat mempengaruhi kepatuhan atau compliance Indonesia terhadap aturan CITES sebagai sebuah rezim internasional adalah (1) ambiguitasi dalam

The term ‘hybrid’ derives from the fact that the motor is operated with the combined principles of the permanent magnet and variable reluctance motors in order

Based on the above variables, we have specified three models which provide alternative explanations for the decision to invest in a personal pension plan. The first of these, which

– Receive training at a NCCER accredited high school, college, government or industry training

Pearson’s correlation coefficient measures the positive or negative linear relationship between two continuous variables?. This example shows how to calculate and

Upon careful review of the literature, there does not appear to be any research examining these family variables (i.e. parent involvement, parental monitoring, and family