Computer Illustration R Simple Linear Regression
2. Regression and Correlation Simple Linear Regression
Software: R Create txt file from SAS data set
data_null_;
file'C:\Documents and Settings\sphlab\Desktop\slr1.txt';
set temp;
put input day:date7. calls fhigh flow high low rain snow weekday
year sunday subzero;
run;
##### You need to delete the dot signs at the beginning of each line########
1.Read in data from text file
data<-read.table("C:/Documents and Settings/liyuan/Desktop/640TA/slr2.txt",header=T) attach(data)
2. Partical listing of output
list(data) [[1]]
day calls fhigh flow high low rain snow weekday year sunday subzero 1 12069 2298 38 31 39 31 0 0 0 0 0 0 2 12070 1709 41 27 41 30 0 0 0 0 1 0 3 12071 2395 33 26 38 24 0 0 0 0 0 0 4 12072 2486 29 19 36 21 0 0 1 0 0 0 5 12073 1849 40 19 43 27 0 0 1 0 0 0 6 12074 1842 44 30 43 29 0 0 1 0 0 0 7 12075 2100 46 40 53 41 1 0 1 0 0 0 8 12076 1752 47 35 46 40 0 0 0 0 0 0 9 12077 1776 53 34 55 38 1 0 0 0 1 0 10 12078 1812 38 32 43 31 0 0 1 0 0 0 11 12079 1842 35 21 35 25 0 0 1 0 0 0 12 12080 1674 39 27 44 31 1 1 1 0 0 0 13 12081 1692 34 28 40 27 0 0 1 0 0 0
3.Plot of calls over time
par(mfrow=c(2,2))
plot(day,calls, xlim=c(12000,12500), ylim=c(1000,9000), xlab=“Day”,ylab=“Calls”, main=”Calls to NY Auto Club 1993-1994”,col=”black”)
4. Tests of Assumption of Normality on Y=calls > mean(calls) [1] 4318.75 > length(calls) [1] 28 >sum(calls) [1] 120925 >var(calls) [1] 7249901 > sum(calls^2) ##uncorrected ss## [1] 717992159 > sum(((calls-mean(calls))^2) ) ##corrected ss## [1] 195747315
Computer Illustration R Simple Linear Regression
##########the package ”fbasic” should be installed first for the following function####### > skewness(calls) [1] 0.4307614 attr(,"method") [1] "moment" > kurtosis(calls) [1] -1.497417 attr(,"method") [1] "excess
######the packages ” nortest” and “stats” should be installed first for the following function####### >shapiro.test(calls)
Shapiro-Wilk normality test data: calls
W = 0.829, p-value = 0.0003628 > cvm.test(calls)
Cramer-von Mises normality test data: calls
W = 0.3112, p-value = 0.0002141 > ad.test(calls)
Anderson-Darling normality test data: calls
A = 1.8673, p-value = 6.68e-05
5. Graphical Assessments of Normality of Y=calls
Histogram with overlay normal
hist(calls,col='lightblue', main='Histogram of calls', breaks=5, include.lowest = TRUE, right = TRUE,freq=F) points(calls,dnorm(calls,mean=mean(calls),sd=sqrt(var(calls))),col='red',lty=6)
Quantile Quantile Plot
qqnorm(calls,datax=TRUE, main=”Simple Normal QQplot for Y=calls”, ylab=”Calls”, xlab=”Normal quantiles”) qqline(calls,datax=TRUE)
Computer Illustration R Simple Linear Regression
qqnorm(calls,datax=TRUE, main=”Simple Normal QQplot for Y=calls”) qqline(calls,datax=TRUE) 2000 4000 6000 8000 -2 -1 0 1 2
Simple Normal QQplot for Y=calls
Sample Quantiles T h e o re ti c a l Q u a n tile s
6.Scatterplot of Y=Calls vs X=low
calls0<-calls[year==0] calls1<-calls[year==1] low0<-low[year==0] low1<-low[year==1]
plot(low0,calls0, main="Calls to NY Auto Club 1993-1994",xlim=c(-10,50),ylim=c(1000,9000), xlab="Low", ylab="Calls", col=”green”)
points (low1,calls1, col="red")
7. Least Squares Estimation and Analysis of Variance Table lm1<-lm(calls~low) summary(lm1) coef(lm1) nova(lm1) Call:
lm(formula = calls ~ low) Residuals:
Min 1Q Median 3Q Max -3112.1 -1467.6 -214.0 1143.9 3587.9
Parameter Estimates
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7475.85 704.63 10.610 6.10e-11 *** low -145.15 27.79 -5.223 1.86e-05 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1917 on 26 degrees of freedom
Computer Illustration R Simple Linear Regression
8. Overlay of straight line fit onto scatterplot of Y=calls vs X=low
abline(lm1)
9. Residuals analysis-Assessment of Normality of Residuals
qqnorm(lm1$residuals, main="Normality of Residuals Y=CALLS v X=LOW")
plot.lm(lm1,which=4, main=”Cook’s Distance Values for Straight Line Y=Calls v X=Low”)
10. Residuals Analysis—Detection of Outliers Using Cook’s Distance
Diag<- ls.diag(lm1)
plot(lm1$fitted,diag$stud.res,ylim=c(-2.0,2.5),xlab="Predicted Value",ylab="Studentized Residual",main="Jacknife Residuals versus Predicted")
abline(h=0,lty=c(3))
-1
0
1
2
Jacknife Residuals versus Predicted
ud en ti z e d Res id u a l