• No results found

Pearson s Correlation Coefficient

N/A
N/A
Protected

Academic year: 2021

Share "Pearson s Correlation Coefficient"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Pearson’s Correlation Coefficient

• In this lesson, we will find a quantitative measure to describe the strength of a linear relationship (instead of using the terms strong or weak). A quantitative measure is important when comparing sets of data.

• The strength of a linear relationship is an indication of how closely the points in a scatter diagram fit a straight line. A measure of this strength is given by a correlation coefficient. • There are several ways that this correlation coefficient can be

found. We will be using the Pearson’s product moment correlation coefficient, which is shortened to Pearson’s correlation coefficient. It is represented by r.

• Important properties of r:

1. r does not depend on the units or which variable is chosen as x or y.

2. r always lies in the range [–1, 1].

3. A positive r indicates a positive association between the variables. A negative indicates a negative association between the variables.

4. A perfect linear correlation occurs if r = –1 or r = 1, that is, when the scatter plot points lie on a straight line.

5. r only measures the strength of a linear association

between two variables. Thus, if there is not evidence that a linear relationship does exist, then it does not make sense to calculate r, because r would not be meaningful.

(2)

• The properties of r and corresponding scatter plots can be summarized as follows:

r = +1 r = +0.9 r = +0.5 r = 0

r = –1 r = –0.9 r = –0.5 r = 0

• The following table provides a good indication of the qualitative description of the strength of the linear relationship and the qualitative value of r.

Value of r Qualitative Description of the Strength

–1 perfect negative

(–1, –0.75) strong negative

(–0.75, –0.5) moderate negative

(–0.5, –0.25) weak negative

(–0.25, 0.25) no linear association

(0.25, 0.5) weak positive

(0.5, 0.75) moderate positive

(0.75, 1) strong positive

1 perfect positive

• Just because two variables are highly correlated does not mean that one necessarily causes the other. Remember, one variable has to be consistently dependent on the other. Just ask

(3)

• For example, your fatigue during the summer months may be influenced by the warm temperature of the day. However, if you happen to experience a lot of fatigue on some other day, this will not indicate the temperature level of the day.

• There are several ways to determine the value of r from a data set.

• Remember that you don't calculate the Pearson's correlation coefficient unless you have first determined if it is appropriate.

You should first use a scatter plot to establish if the data indicates a linear relationship. If the data indicates that there

is a possible linear relationship, then the calculation of the r value is appropriate.

• To compute the value of r, we can use the following formulas:

(

)(

)

(

) (

2

)

x x y y r

x x y y

− −

=

− −

2 or

xy x y

xy x y

s r

s s s r s s

= = •

Even though the first formula is a useful form for manually finding the value of r, it requires very tedious work. The GDC can produce a value of r with the push of a few buttons. • Example 1:

Find the Pearson correlation coefficient for the following set of data.

x 2 3 4 6 8 9 10

y 21 19 18 17 15 13 12

First we input the data into the GDC and draw a scatter plot to determine if it is appropriate to use Pearson's correlation coefficient.

• Clear the home screen by pressing <2nd><mode> • Press <2nd><0> to access the Catalog menu • Locate DiagnosticOn

• Press <Enter><Enter>

• Press <Stat>, choose Calc, and then select 4: LinReg(a+bx) and enter L1,L2

(4)

The screen should now display the value of r.

There is also an r2 on the calculator screen. We need to look at what this r2 represents.

• r2 is called the

coefficient of determination.

2 explained var iation

r

total var iation =

• r2 is a proportion whereas r is the square root of a proportion. • r2 is more informative when interpreting the magnitude of the

relation between two variables, regardless of directionality. • For two linearly related variables, r2 provides the proportion of

variation in one variable that can be explained by the variation in the other variable.

• For example, r2 = 0.947 or 94.7% means that approximately 95% of the variation in the variable y can be explained by the variation in the variable x. The higher this value is the better. • For example, if r = 0.6, then r2 =0.36 means that only 36% of

the variation in one variable is explained by the variation in the other variable, so r = 0.6 is not really very good.

• Example 2:

The following table gives the final exam scores and the final averages for 10 students in biology class.

Final Exam Scores Final Averages

66 73 81 86 75 68 62 71 93 95 82 89 51 48 54 63 50 51 90 94

(5)

Enter the data into L1 and L2 in the GDC. When we plot the scatter diagram on the GDC, we find that it does appear reasonable to assume that a linear relationship exists. Therefore, calculating the value of r is appropriate.

Remember, if there does not appear to be a linear relationship, then calculating the value of r does not make sense.

Next, we need to find r and r2. Remember that to find r and r2: • Clear the home screen • Go to the catalog menu • Turn on Diagnostics

• Press <Stat>, choose Calc, and select LinReg(a+bx) • Press <Enter>

We see that r = 0.9507. This indicates a very strong positive relationship between the final exam grade and the final average grade.

Also, the r2 = 0.9037953898 tells us that about 90% of the

variation in the final average grade can be accounted for by the variation in the final exam grade. That is, only

approximately 10% of the variation is attributed to other factors. • In the next example, the covariance, sxy is given. We are going

to use the formula below to calculate the correlation coefficient r. Then, we are going to compare this to the coefficient r found using the graphing calculator.

xy x y

xy x y

s r

s s s r s s

= = •

(6)

• Example 3:

The following table represents the final averages of ten students in math and science. Given that the covariance, sxy = 465.66, calculate the product moment correlation coefficient, r, correct to 2 decimal places.

First we need to make sure that there appears to be a linear relationship between math and science grades. We enter the grades in the GDC and use a scatter diagram to decide this. Then, find the standard deviations for x and y. Use <Stat>, Calc, and then select 2–Var Stats L1,L2.

WARNING!

We want sx and sy, but in the calculator we will use the

σ

x and y

σ

.

x

σ

y

σ

xy x y

s r

s s

= = _____________________ ≈

Now, use the GDC to check this value of r. Use Stat, Calc, and LinReg.

(7)

Comments about the findings:

• There is strong evidence from a scatter diagram that a

positive linear relationship exists between the final averages for math and science.

• The value of the correlation coefficient is _______________. • This value of r indicates that there is a strong correlation

between math and science final averages. This means that those students who do well in mathematics are likely to also do well in science.

• The value of r2 is _______________, which indicates that only _______________% of the variation in the science averages can be attributed to the variation in the math averages.

References

Related documents

A drawback of the Mapuche hierarchy is that at least one transitional ending (first person singular subject acting upon second person singular object) contains an inverse marker

In sports where there are more than two possible outcomes, such as soccer (football), usually the prospective gambler will find that if he/she wants to bet on one team

They further argue that debtors financial problems are primarily the result of unexpected life events that put them in a precarious financial position that has little to do with

The sets consisted of the same metal types as the original 4 types used in the main treatment of this study with the exception that no type 316 stainless steel could be found, so

When a call is in progress on the cellular phone connected to this unit via Bluetooth wireless technology, the sound of the Bluetooth audio player connected to this unit is muted..

circuit, then , are the only two trail sets, such that .Thus is a two element chain. Remark 3.5: is a two element chain for all graphs containing spanning circuit. Infact

16:00 Driving Data Management – the Cost, Future and Unknown – Breakout Sessions 16:00-17:00 Main Room (Session details below) 16:00-17:00 Breakout Room 1 Personal Development

ةيلمع رييغتلا ،ريوطتلاو كلذكو ىزعي ىلإ ةيمهأ ريوطتلا رييغتلاو مئادلا رمتسملاو يف تاسسؤملا ةيوبرتلا يف ملاع مئاد لا ،ريغت امنيب ءاج يف ةبترملا ةريخلأا تارقفلا. رداق ىلع