Tied Ranks

Sometimes it is not possible to distinguish between the ranks of two or more items. For example, two students may get the same mark in an examination and so they have the same rank. Or, two or more people in a group may be the same height. In such a case, we give all the equal ones an average rank and then carry on as if we had given them different ranks. You will see what this means by studying the following examples:

(a) First two equal out of eight:

1½ 1½ 3 4 5 6 7 8

Average of 1 & 2

(b) Three equal out of nine, but not at the ends of the list:

1 2 3 5 5 5 7 8 9

Average of 4, 5 & 6

1 2 3 4 5 6 7½ 7½

Average of 7 & 8

(d) Last four equal out of eleven:

1 2 3 4 5 6 7 9½ 9½ 9½ 9½

Average of 8, 9, 10 & 11

Strictly speaking, a rank correlation coefficient should not be used in these cases without making some adjustment for tied ranks. But the formula for the adjustments are a little complex and are outside the scope of this course. The best way for you to deal with tied ranks in practice is to calculate the ordinary (Pearson's) correlation coefficient. If, in an examination, you are specifically asked to calculate a rank correlation coefficient when there are tied ranks, then of course you must do so; but you might reasonably add a note to your answer to say that, because of the existence of tied ranks, the calculated coefficient is only an approximation, although probably a good one.

Final note: rank correlation coefficients may be used when the actual observations (and not

just their rankings) are available. We first work out the rankings for each set of data and then calculate Spearman's or Kendall's coefficient as above. This procedure is appropriate when we require an approximate value for the correlation coefficient. Pearson's method using the

actual observations is to be preferred in this case, however, so calculate a rank correlation

119

Study Unit 8

Linear Regression

Page

Introduction 120 A. Regression Lines 121

Nature of Regression Lines 121

Graphical Method 122

Mathematical Method 123

B. Use of Regression 125

C. Connection Between Correlation and Regression 126

120 Linear Regression

INTRODUCTION

We've seen how the correlation coefficient measures the degree of relationship between two variates (variables). With perfect correlation (r = +1.0 or r = –1.0), the points of the scatter diagram all lie exactly on a straight line. It is sometimes the case that two variates are perfectly related in some way such that the points would lie exactly on a line, but not a

straight line. In such a case r would not be 1.0. This is a most important point to bear in mind

when you have calculated a correlation coefficient; the value may be small, but the reason may be that the correlation exists in some form other than a straight line.

The correlation coefficient tells us the extent to which the two variates are linearly related, but it does not tell us how to find the particular straight line which represents the relationship. The problem of determining which straight line best fits the points of a particular scatter diagram comes under the heading of linear regression analysis. Estimating the equation of the best-fitting line is of great practical importance. It enables us to test whether the

independent variable (x) really does have an influence on the dependent variable (y), and we may be able to predict values of y from known or assumed values of x. In business and management research, it is important to gain an understanding of the factors that influence a firm's costs, revenues, profits and other key performance indicators, and to be able to predict changes in these variables from knowledge of possible changes in their determinants.

Regression analysis therefore can be of great value in business planning and forecasting. Remember that a straight-line graph can always be used to represent an equation of the form y = a + bx. In such an equation, y and x are the variables while a and b are the

constants. Figure 8.1 shows a few examples of straight-line graphs for different values of a and b. Note the following important features of these linear graphs:

 The value of a is always the value of y corresponding to x = 0.

 The value of b represents the gradient or slope of the line. It tells us the number of units change in y per unit change in x. Larger values of a mean steeper slopes.

 Negative values of the gradient a mean that the line slopes downwards to the right; positive values of the gradient a mean that the line slopes upwards to the right. So long as the equation linking the variables y and x is of the form y = a + bx, it is always possible to represent it graphically by a straight line. Likewise, if the graph of the relationship between y and x is a straight line, then it is always possible to express that relationship as an equation of the form y = a + bx.

If the graph relating y and x is not a straight line, then a more complicated equation would be needed. Conversely, if the equation is not of the form y = a + bx (if, for example, it contains terms like x2or log x) then its graph would be a curve, not a straight line.

Linear Regression 121

Figure 8.1: Straight line graphs

A. REGRESSION LINES

In document Quantitative Methods (Page 123-127)

Study Unit 8

Linear Regression

Contents

Page

INTRODUCTION

A. REGRESSION LINES