regression.pptx

(1)

Regression

The process of identifying the relationship and the effects of this relationship on

the outcome of future values of objects is defined as regression. Regression helps

in identifying the behavior of a variable when other variable(s) are changed in

the process. Regression analysis is used for prediction and forecasting

applications

_{For example}

_{, a regression model could be used to predict children's height, given}

their age, weight, and other factors.

A regression task begins with a data set in which the target values are known. For

example, a regression model that predicts children's height could be developed based on observed data for many children over a period of time. The data might track age, weight, developmental milestones, family history, and so on. Height would be the target, the other attributes would be the

predictors

,

and the data for each child would constitute a case.

Common Applications of Regression

Regression modeling has many applications in trend analysis, business planning,

(2)

Simple Linear Regression

It is a statistical method that allows us to summarize and study relationships

between two continuous (quantitative) variables:

One variable, denoted x, is regarded as

the predictor, explanatory, or

independent variable

.

The other variable, denoted y, is regarded as the

response, outcome, or

dependent variable

.

(3)

Types of relationships

(4)

Types of relationships

(5)

Types of relationships

Linear/non linear/ no relationships

(6)

(7)

What is the "Best Fitting Line"?

(8)

(9)

(10)

What is the "Best Fitting Line"?

(11)

What is the "Best Fitting Line"?

Below are formulas for the intercept b0 and the slope b1 for the

(12)

(13)

Common Error Variance

σ2 quantifies how much the responses (y) vary around the

(14)

(15)

Common Error Variance

Example: Suppose you have two brands (A and B) of

thermometers, and each brand offers a Celsius

thermometer and a Fahrenheit thermometer. You measure

the temperature in Celsius and Fahrenheit using each

(16)

(17)

(18)

(19)

(20)

1. Coefficient of Determination, r-square

How well does your regression equation truly represent

your set of data?

(21)

(22)

Coefficient of Determination, r-square

For plot in figure in previous side, note that SSTO = SSR + SSE. The sums of squares appear to tell the story pretty well. They tell us that most of the variation in the response y (SSTO = 1827.6) is just due to random variation (SSE = 1708.5), not due to the regression of y on x (SSR = 119.1). You might notice that SSR divided by

SSTO is 119.1/1827.6 or

0.065.c

(23)

(24)

Correlation Coefficient r

(25)

Few examples

r2 = 100% and r = 1.000

measures tell us that there is a perfect linear relationship between temperature in degrees Celsius and temperature in degrees Fahrenheit.

(26)

Few examples

r2 = 90.4% and r = 0.951

(27)

Linear regression in R

Reading in the data and splitting

library(xlsx)

powerData <- read.xlsx('

Folds5x2_pp.xlsx'

, 1)

Data splitting

set.seed(123)

split <- sample(seq_len(nrow(powerData)), size = floor(0.75 * nrow(powerData)))

trainData <- powerData[split, ] testData <- powerData[-split, ]

The dataset is obtained from the UCI Machine Learning Repository. The dataset

(28)

Linear regression in R

Building the prediction model

predictionModel <- lm(PE ~ AT + V + AP + RH, data = trainData)

(29)

(30)

Testing the prediction model

We will now apply the prediction model to the test data.

prediction <- predict(predictionModel, newdata = testData)

head(prediction)

2 4 12 13 14 17

444.0433 450.5260 456.5837 438.7872 443.1039 463.7809 head(testData$PE)

[1] 444.37 446.48 453.99 440.29 451.28 467.54

(31)

Testing the prediction model

We can calculate the value of R-squared for the prediction model on the test data set as follows:

SSE <- sum((testData$PE - prediction) ^ 2)

SST <- sum((testData$PE - mean(testData$PE)) ^ 2) 1 - SSE/SST