• No results found

Correlation and Regression Analysis

N/A
N/A
Protected

Academic year: 2021

Share "Correlation and Regression Analysis"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Correlation

and

Regression

Analysis

Nayyar Raza Kazmi

(2)

Objectives of the Lecture

• To understand the concept of Correlation and Regression Analysis.

• Understand the areas in which Correlation and regression Models can be applied.

• Understand interpreting Correlation and Regression parameters.

(3)

• Most of studies done by Post

graduate trainees are cross-sectional in nature.

• Analysis of such studies is

mostly confined to application of descriptive univariate

statistics.

• Quality of such studies can be

enhanced by further data mining by Correlation and Regression Analysis.

(4)

Correlation

– Strength of association between two

variables.

– Tells us how much the two variables are

associated with one another.

– However doesn’t assume CAUSATION.

– Simply tells us whether the two variables

(5)

Regression

• If there is a strong correlation between two variables, Regression is used to determine the value of dependent variable (Y) from

the value of independent variable (X) • Types

– Simple Linear Regression – Multiple Linear Regression – Logistic Regression

(6)

Correlation Analysis

Correlation Analysis

The

Independent

Independent

Variable

Variable

provides the basis for estimation. It is the predictor variable.

Correlation Analysis

Correlation Analysis is a group of statistical techniques to measure the association between two variables.

A

Scatter Diagram

Scatter Diagram

is a chart that portrays the relationship between two variables.

The

Dependent

Dependent

Variable

Variable

is the variable being predicted or estimated.

Advertising Minutes and $ Sales

0 5 10 15 20 25 30 70 90 110 130 150 170 190 Advertising Minutes Sa le s ($ th ou sa nd s)

(7)

The Coefficient of Correlation,

r

Negative values indicate an inverse relationship and

positive values indicate a direct relationship.

The

Coefficient of Correlation

Coefficient of Correlation

(r) is a measure of the strength of the relationship between two variables.

- 1 0 1

P e a r s o n ' s r

Also called Pearson’s r and

Pearson’s product moment correlation coefficient.

It requires interval or ratio-scaled data.

It can range from -1.00 to 1.00.

Values of -1.00 or 1.00

indicate perfect and strong correlation.

Values close to 0.0 indicate weak correlation.

(8)

Perfect Negative Correlation 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y

(9)

0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y Perfect Positive Correlation

(10)

0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y Zero Correlation

(11)
(12)

Phi Co-efficient

• Used for two categorical variables

ad - bc

(a+b)(a+c)(c+d)(b+d)

(13)

Regression Equation and

Regression Line

• where = computed value of the dependent variable • a = Y-intercept where X equals zero

• b = slope of the regression line, which is the increase or decrease in Y for each change of one unit of X

• X = a given value of the independent variable

Y

c = a + bX

Y

(14)

Simple Linear Regression

• Determines the value of a Dependent Variable based on a single independent Variable.

(15)
(16)
(17)
(18)

Multiple Linear Regression

• Used when the Dependent Variable is a

continuous variable and independent variables are continuous or categorical.

(19)

Putting MLR in Practice

• A descriptive study on normal healthy

adults aged 14-25 years gathers date about their weight, systolic Blood

(20)
(21)

?????

• Is serum cholesterol level associated with weight and systolic blood pressure?

• Can we predict Serum Cholesterol levels if we know a persons weight and systolic

(22)
(23)

Y= 18.52+3.20(BP)+[-4.06(Weight)]

So What could be the Serum Cholesterol level for a person who weighs 75Kg and has a

systolic Blood Pressure of 145mm Hg????

Y= 18.52+3.20(145)+[-4.06(75)] Y= 18.52+464+[-304.5]

Y= 18.52+464-304.5

Y= 178.02

(24)

Logistic Regression

• Logistic Regression is used when the outcome variable is categorical

• The independent variables could be either categorical or continuous

• Logistic Regression determines the Odds Ratio for various independent variables for the dichotomous dependent variable

(25)

• The Dichotomous Dependent variable could be presence/ absence of a

complication, disease etc.

• Data for dichotomous variables must be binary coded like 1 for presence of

complication or disease and 0 for Absence of complication or disease.

(26)

Putting Logistic Regression in

Practice

• Risk Factors for Complications of

Diabetes Mellitus in patients admitted to a Tertiary Care Hospital

(27)

Risk Factors for

Retinopathy No of patients(n=32) %age

BMI> 30 13 40.26 Smoking 28 87.5 Level of prior awareness 14 43.75 HbA1C >7 10 31.25 Duration of Diabetes > 10 Years 20 62.5

(28)
(29)

Where Correlation and Regression

Models can be applied

• Cross-sectional studies. • K.A.P Studies

• Studies aiming to determine relationships between certain factors of interest and

(30)

Softwares to use

• MS Excel with Data

Analysis add-in installed

• SPSS

• Epi Info 2002

• MedCalc (Recommended because of ease of use and power to perform all types of statistical

(31)

• Thankyou for your patience.(There is a Negative Strong Correlation between length of Biostats lecture and the Your

moods evident by the 11 “O” Clock sign on your forheads

• Questions, Queries and Suggestions are welcome.

References

Related documents