• No results found

Simple Linear Regression

N/A
N/A
Protected

Academic year: 2021

Share "Simple Linear Regression"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Simple Linear Regression

Regression equation—an equation that describes the

average relationship between a response (dependent) and an explanatory (independent) variable.

2 2 4 6 8 10 x 2 4 6 8 10 1 2 14 y 6 3 3 y     4 2 2 x     3 slope 1.5 2 y x      y mx b 

Slope-intercept equation for a line

(4,6) (2,3)

A model that defines an exact relationship between variables.

Example:

There is no allowance for error in the prediction of y for a given x.

Deterministic Model

1.5 yx -10 0 10 20 30 40 50 Celsius 0 2 4 48 72 96 12 0 F a h re n h e it 9 32 5 FC

(2)

5

Probabilistic Model

A model that accounts for random error.

Includes both a deterministic component and a random error component.

This model hypothesizes a probabilistic relationship between y and x. 1 5 random error y. x 6 0 2 4 6 8 10 x 0 5 10 15 y 1.5 yx

Probabilistic Model—General Form

y = Deterministic component + Random component

where y is the “variable of interest”.

Assume that the mean value of the random error is zero  the mean value of y, E(y), equals the deterministic component of the model

= population y-intercept of the line—the point at which the line intersects or cuts through the y-axis = population slope of the line—the amount of increase (or

decrease) in the deterministic component of y for every 1-unit increase (or decrease) in x.

= random error component

First-Order (Straight Line) Probabilistic Model

0 1

y

x

0  1 

where y = Dependent variable x = Independent variable

(3)

9 are population parameters. They will only be known if the population of all (x, y) measurements are available.

, along with a specific value of the independent variable x determine the mean value of the dependent variable y.

First-Order (Straight Line) Probabilistic Model

0 and 1

 

0 and 1

 

10 will generally be unknown.

The process of developing a model, estimating model parameters, and using the model can be summarized in these 5-steps:

Model Development

0 and 1

 

1. Hypothesize the deterministic component of the model that relates the mean, E(y) to the independent variable x

2. Use sample data to estimate unknown model parameters

 

0 1

E y  x

0 0 1 1

find estimates: or ˆ b ,ˆ or b

Model Development (continued)

3. Specify the probability distribution of the random error term and estimate the SD of this distribution

4. Statistically evaluate the usefulness of the model 5. Use model for prediction, estimation or other

purposes

~N 0, will revisit this later

 

Example: Reaction time versus drug percentage

4 5 5 2 4 4 2 3 3 1 2 2 1 1 1 Reaction Time (seconds) y Amount of Drug (%) x Subject

(4)

13 0 1 2 3 4 5 Drug (%) -1 .0 -0 .2 0. 7 1. 5 2. 3 3. 2 4. 0 R ea ct io n T im e (s ec .)

Example: Reaction time versus drug percentage

14 0 1 2 3 4 5 Drug (%) -1 .0 -0 .2 0. 7 1. 5 2. 3 3. 2 4. 0 R ea ct io n T im e (s ec .)

Example: Reaction time versus drug percentage

Example: Reaction time versus drug percentage

Errors of prediction---vertical differences between the observed and the predicted values of y

Sum of squared errors (SSE) = 2 Sum of errors = 0 0 (4-4) = 0 4 4 5 1 (2-3) = -1 3 2 4 0 (2-2) = 0 2 2 3 0 (1-1) = 0 1 1 2 1 (1-0) = 1 0 1 1 y x y  1 xy y   2 y y 

Least Squares Line

Also called regression line, or the least squares prediction equation

Method to find this line is called the method of least squares

For our example, we have a sample of n = 5 pairs of (x, y) values. The fitted line that we will calculate is written as

is an estimator of the mean value of y, ; and are estimators of and

0 1

ˆy b b x

ˆy E y

 

0

(5)

17 Define the sum of squares of the deviations of the y values about their predicted values for all n data points as:

We want to find and to make the SSE a minimum---termed least squares estimates

is called the least squares line

Least Squares Line (continued)

2

2 0 1 1 1 SSE n i n i i i i ˆ y y y b b x     

 

  0 1 ˆy b b x 0 b b1 18 Slope: y-intercept: n = sample size

Formulas for the Least Squares Estimates

1 SS SS xy xx b  1 SD SD y x br 0 1 1 i i y x b y b x b n n   

  



= SS xy xy i i i i i i s x x y y x y x y n     

 

2 2 2 = SS xx xx i i i s x x x x n    

or

LS Calculations for Drug/Reaction Example

20 25 4 5 8 16 2 4 6 9 2 3 2 4 1 2 1 1 1 1 15 i x

yi10 2 55 i x

x yi i37   15 10 SS 37 37 30 7 5 xy      2 15 SS 55 55 45 10 5 xx     1 7 0 7 10 b  .      0 10 7 15 5 5 2 7 3 2 2 1 1 b . . . .         i i x y 2 i x i y i x

LS Line for Drug/Reaction Example

0 1 2 3 4 5 -1 .0 -0 .2 0. 7 1. 5 2. 3 3. 2 4. 0 R ea ct io n T im e (s ec .)

1 7

ˆy

  

.

. x

(6)

21

LS Calculations for Drug/Reaction Example

Sum of squared errors (SSE) = 1.10 Sum of errors = 0 .36 (4-3.4) = .6 3.4 4 5 .49 (2-2.7) = -.7 2.7 2 4 .00 (2-2.0) = 0 2.0 2 3 .09 (1-1.3) = -.3 1.3 1 2 .16 (1-.6) = .4 .6 1 1 y x ˆy  .1 7. xy yˆ y yˆ2

The LS line has a sum of errors = 0, but SSE = 1.1 < 2.0 for visual model

22 Least Squares Line—Interpretation of

Estimated intercept is negative

that the estimated mean reaction time is equal to -0.1 seconds when the amount of drug is 0%. What does this mean since negative reaction times are not possible?

Model parameters should be interpreted only within the sampled range of the independent variable.

1 7 ˆy  . . x

Least Squares Line—Interpretation of

The slope of 0.7 implies that for every unit increase of x, the mean value of y is estimated to increase by 0.7 units.

In the context of the problem:

For every 1% increase in the amount of drug in the bloodstream, the mean reaction time is estimated to increase by 0.7 seconds over the sampled range of drug amounts from 1% to 5%. 1 7 ˆy  . . x 0 1 2 3 4 5 -1 .0 -0 .2 0 .7 1. 5 2 .3 3 .2 4. 0 R e ac ti o n T im e ( s ec .)

Coefficient of Determination

A measure of the contribution of x in predicting y

Assuming that x provides no information for the prediction of y, the best prediction for the value of y is y

y y

ˆ 

2

(7)

25 0 1 2 3 4 5 Drug (%) -1 .0 -0 .2 0. 7 1. 5 2. 3 3. 2 4. 0 R ea ct io n T im e (s ec .)

Coefficient of Determination (continued)

2 SSE i i i ˆ y y

 26

Coefficient of Determination (continued)

2

SSE

yiˆyi

2

SSyy

yiy --total sample variation around mean

--unexplained sample variability after fitting

SSyy SSE --explained sample variability attributable to linear relationship explained total  yy SS SSE SS

yy proportion of total sample

variability explained by the linear relationship 

Coefficient of Determination (continued)

In simple linear regression is computed as the square of the correlation coefficient, r

Interpretation— means that the sum of squared deviations of the y values about their predicted values has been reduced by 75% by the use

of the least squares equation. 2 SS SSE 1 SSE SS SS yy yy yy r     2 r 2 0r 1 2 75 r. instead of to predict ˆy, y , y

}

Unexplained variability

References

Related documents

In Afghanistan, this is a qualification through either the Institute of Health Sciences or the Community Midwifery Education programme.. Skilled

The most important contributions of this paper are (i) the proposal of a concern mining approach for KDM, what may encourage other groups to start researching on

This paper looks at industry leaders in supply chain risk management and explores how humanitarian supply chains can learn from industry best

In this investigation, the cutting tool wear estimation in coated carbide tools using regression analysis, fuzzy logic and Artificial Neural Network (A–NN) is

Release time and faculty ‘passports’ that allow faculty to travel freely between the borders of colleges, departments, and majors should be arranged by the Office of the

The objective of this study was to develop Fourier transform infrared (FTIR) spectroscopy in combination with multivariate calibration of partial least square (PLS) and

Field experiments were conducted at Ebonyi State University Research Farm during 2009 and 2010 farming seasons to evaluate the effect of intercropping maize with