• No results found

InstallingRandrelatedAdd

N/A
N/A
Protected

Academic year: 2020

Share "InstallingRandrelatedAdd"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

R installation and analysis notes 

Murtaza Haider  June 1, 2010 

 

Table of Contents 

Installing R and related Add‐ins ... 3 

R with point‐and‐click GUI feature ... 9 

Other packages required ... 10 

Learning R and R Cmdr ... 11 

A sample session in R Cmdr ... 12 

Fair's Extramarital Affairs Data ... 12 

Usage ... 12 

Format ... 12 

Source ... 13 

Objects ... 16 

Storing R session ... 16 

Summary tables ... 16 

Aggregate ... 16 

Writing your own functions ... 17 

Conditional transformation of a variable ... 17 

Cross Tabs ... 17 

Recoding characters into Numeric Factor ... 17 

Missing values ... 18 

Factor Analysis ... 18 

Further Factor Analysis ... 18 

Modelling Categorical Dependant Variables ... 20 

Logistic Regression and Exponential Coefficients ... 20 

Conditional Logit ... 20 

(2)

Forecasts ... 20 

With Zelig ... 21 

***this gets the forecasts: ... 21 

Grouped Logit ... 22 

Beta reg ... 22 

Model estimates and coefficients ... 23 

T‐Test for NHD ... 23 

Data Input ... 23 

Reading data ... 23 

Keeping certain variables or records ... 23 

Merge ... 24 

Correlation Analysis ... 25 

Panel Data and Robust Standard Errors ... 26 

###  TEACHING EVALUATION OF PROFESSORS, Jan 24, 2010 ... 26 

### Weighted LEAST SQUARES ... 26 

###PANEL DATA MODELS ... 26 

### ROBUST STANDARD ERRORS ... 27 

Weighted Arithmetic Mean ... 27 

Usage ... 27 

Arguments ... 27 

Examples ... 28 

Weighted Cross tabs using Koppelman Intercity example ... 28 

Estimate Table... 29 

Usage ... 29 

Arguments ... 29 

Example ... 30 

Description Table ... 32 

 

(3)

Installing R and related Add‐ins 

 

You can learn about the R Project website: http://www.r‐project.org/  

To download R from the University of Toronto’s site, please click on http://probability.ca/cran/. R is  available for Linux, Windows, and MacOS X. 

To download the Windows version, please visit: 

  http://cran.stat.sfu.ca/ and click on base   

and click onDownload R 2.9.0 for Windows (36 megabytes)  

  

Save the file and then double‐click it to install R. 

The following dialogue will appear: 

 

Select OK. 

(4)

 

Click Next on the following dialogue box: 

 

Select the directory to install the software. For default location, click Next. 

(5)

Click Next on the following dialogue box: 

(6)

 

Click Next on the following ensuring that you have selected (Yes customize startup)

 

Select SDI (separate Windows) on the following dialogue box. 

 

Select NEXT on the following dialogue boxes: 

 

 

(7)

 

‐‐ 

 

‐‐ 

 

‐‐ 

If more dialogues appear, please click Next. R will install and you’ll notice the following symbol on your  desktop after installation: 

(8)

Double click to run R. 

R will launch and appear as follows: 

 

 

Researchers around the world have contributed approximately 2000 packages for R. You can search for  a package from the R website and download/install it directly from within R using: 

Packages>install packages.  

R will prompt you to select a mirror site to  

download the packages. I often use the Ontario site.  

Select site & Click ok. 

(9)

R with point‐and‐click GUI feature 

Until recently, R was a command driven software.  John Fox at McMaster University has added the GUI  capabilities to R in a package. For details, please see: 

http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/ 

To install R Commander, please type the following at the red cursor in R: 

  >install.packages("Rcmdr", dependencies=TRUE) 

 

This command will download R commander.  

Once downloaded, load R Commander by typing the following at the red prompt: 

  >library(Rcmdr) 

This will launch the Rcmdr, which is a GUI like environment shown on the next page. 

Also, update packages by selecting  

  >packages>update packages 

(10)

 

 

To add additional functionality to R Cmdr, various packages in R have provided plugins for R Cmdr to  offer the point and click functionality. Those plugins could be installed from within R as well. 

Use Packages>Install Packages to select Rcmdr plug‐ins. 

Other packages required 

 AER 

 Hmisc 

 IsWR  

 mlogit 

 Zelig 

 Psych 

 R2HTML 

 Ecdat 

 Estout and apsrtable 

 Betareg   

 

(11)

 

Learning R and R Cmdr 

If you are migrating from SPSS or SAS, you’ll find the Quick R site extremely helpful. 

http://www.statmethods.net/ 

Other sources include: 

 An Introduction to R: http://cran.r‐project.org/doc/manuals/R‐intro.pdf  

 UCLA website to learn R is a great resource for stats/econometrics in R:  http://www.ats.ucla.edu/stat/R/ 

o Multinomial Logit in R: http://www.ats.ucla.edu/stat/R/dae/mlogit.htm  

 Using R: http://cran.r‐project.org/doc/contrib/usingR.pdf  

 For Discrete Choice Models, you need to install the mlogit package: 

http://cran.r‐project.org/web/packages/mlogit/index.html  

 

Please note that data analysis and modeling is far more convenient in R Cmdr. See the following  page for example. 

(12)

 

A sample session in R Cmdr 

I have loaded a data set from a peer‐reviewed publication that documents the determinants of extra  marital affairs.  

Fair's Extramarital Affairs Data 

Infidelity data, known as Fair's Affairs. Cross‐section data from a survey conducted by Psychology Today  in 1969.  

Usage 

data("Affairs")

Format 

A data frame containing 601 observations on 9 variables.  

affairs  

numeric. How often engaged in extramarital sexual intercourse during the past year?   gender  

  factor indicating gender.  

age  

numeric variable coding age in years: 17.5 = under 20, 22 = 20–24, 27 = 25–29, 32 = 30–34, 37 

= 35–39, 42 = 40–44, 47 = 45–49, 52 = 50–54, 57 = 55 or over.  

Yearsmarried 

numeric variable coding number of years married: 0.125 = 3 months or less, 0.417 = 4–6 

months, 0.75 = 6 months–1 year, 1.5 = 1–2 years, 4 = 3–5 years, 7 = 6–8 years, 10 = 9–11 

years, 15 = 12 or more years.  

children 

  factor. Are there children in the marriage?   religiousness  

  numeric variable coding religiousness: 1 = anti, 2 = not at all, 3 = slightly, 4 = somewhat, 5 = 

very.  

education 

numeric variable coding level of education: 9 = grade school, 12 = high school graduate, 14 =  some college, 16 = college graduate, 17 = some graduate work, 18 = master's degree, 20 = 

Ph.D., M.D., or other advanced degree.    

occupation  

numeric variable coding occupation according to Hollingshead classification (reverse  numbering).  

rating  

numeric variable coding self rating of marriage: 1 = very unhappy, 2 = somewhat unhappy, 3 = 

(13)

Source 

Online complements to Greene (2003). Table F22.2.  

http://pages.stern.nyu.edu/~wgreene/Text/tables/tablelist5.htm    

 

In R Cmdr, I clicked on the following: 

Statistics>Summaries>Active Dataset 

The command produces the following syntax: 

summary(Affairs) 

and the following output: 

 

The average number of affairs is 1.46. There are 315 women and 286 men in the data set. Average age is  32.5 years. The average number of years married is 8.2 years. 430 out of the 601 respondents had  children.  On a scale of 1 to 5, the mean religiousness equaled 3.1. The average years of schooling was  16.17 years. 

Using Ordinal Logit, we ask if all else being equal, what is the impact of marital bliss on the propensity  to have an affair. 

In R Cmdr, I first typed the following in script window:  

fac.affairs<‐factor(affairs) 

I clicked on the following: 

(14)

 

 

 

 

The following output is generated: 

polr(formula = fac.affairs ~ age + children + gender + factor(religiousness) + factor(rating), data = Affairs, subset = affairs < 7, Hess = TRUE, method = "logistic")

Coefficients:

Value Std. Error t value age -0.02587355 0.01669516 -1.5497639 children[T.yes] 0.78015980 0.35433369 2.2017658 gender[T.male] 0.52169948 0.27329031 1.9089571 factor(religiousness)[T.2] -1.06156173 0.49774279 -2.1327516 factor(religiousness)[T.3] -0.46820651 0.48288700 -0.9695985 factor(religiousness)[T.4] -0.99231507 0.47187055 -2.1029392 factor(religiousness)[T.5] -0.96170741 0.58225090 -1.6517062 factor(rating)[T.2] -0.42747211 0.77127502 -0.5542408 factor(rating)[T.3] -0.92664592 0.74809036 -1.2386818 factor(rating)[T.4] -1.33084546 0.72230890 -1.8424880 factor(rating)[T.5] -1.73421084 0.73402207 -2.3626140

Intercepts:

Value Std. Error t value 0|1 -0.1773 0.9512 -0.1864 1|2 0.5975 0.9521 0.6276 2|3 1.2950 0.9603 1.3485 3|7 17.0637 0.9603 17.7690 7|12 17.9800 0.9603 18.7232

(15)

AIC: 561.8889

 

The model suggests that male are more likely to report an affair, presence of children correlates with  fewer affairs, religious minded individuals reported fewer affairs, and yes, those who reported that they  were happily married were also less likely to have affairs. 

 

(16)

R Data Analysis 

Objects 

 

Listing objects: 

Objects

()

 

Removing objects: 

rm(x,y)

Storing R session 

Stored as .Rdata and .Rhistory 

Summary tables 

 Summarising a dummy variable  summary(walk.jun3$NSL)

 Without quantiles and FACTOR 

numSummary(Housing_VPT[,"total"], groups=Housing_VPT$nud,

statistics=c("mean", "sd", "length"), quantiles=c( 0,.25,.5,.75,1 ))

 With quantiles 

numSummary(Housing_VPT[,"n.cars"],

groups=Housing_VPT$attach.nbhd, statistics=c("mean", "sd", "quantiles"), quantiles=c( 0,.25,.5,.75,1 )

 More than one variable 

numSummary(Housing_VPT[,c(".est.work", ".In.cars.1")], statistics=c("mean"), quantiles=c( 0,.25,.5,.75,1 ))  More than one variable by FACTOR 

numSummary(Housing_VPT[,c(".est.work", ".In.cars.1")], groups=Housing_VPT$club.sport, statistics=c("mean", "sd", "quantiles"), quantiles=c( 0,.25,.5,.75,1 ))

Aggregate 

 

tapply(dep.var, list(cat.var1,cat.var2),mean)

tapply(TeachingRatings$eval, list(gender=TeachingRatings$gender, minority=TeachingRatings$minority), mean, na.rm=TRUE)

aggregate(variables, list(Urban=urban,Auto=auto.cat),mean,na.rm=T) as.data.frame(aggregate(variables,

(17)

t(aggregate(variables, list(Urban=urban,Auto=auto.cat),mean,na.rm=T)) aggregate(walktrip$n.park, by=list(NHD=walktrip$nhd),

FUN=c("count","sum","mean","median","sd","min","max"), na.rm=TRUE,length.warning=TRUE)

Writing your own functions 

Often one needs more than just mean or sd in the calculations, however, the allowance is for just one  statistics, e.g., mean. One can write custom function to avoid the limitation. Consider a function that  produces both mean ,sd, min, and max. 

meansd<-function(x) { c(mean(x),sd(x),min(x),max(x))} meansd(x)

tapply(x, factor,meansd)

Conditional transformation of a variable 

Let's say we are interested in revising a variable subject to certain condition, e.g., if one variable crosses  a threshold modify the other variable as 1 and 0 otherwise. Here is how the code works in R for a new  variable that identifies birth rates as low and high based on a threshold of 2.  

Data set: demog

Existing variable: brate

New variable: brate.cat (0 if brate<2, and 1 otherwise) demog$brate.cat<-1

demog$brate.cat [brate<2] <- 0

Cross Tabs 

mytable <- table(walktrip$atelkey.maitre,walktrip$nhd) # A will be rows, B will be columns

#mytable # print table

margin.table(mytable, 1) # A frequencies (summed over rows) margin.table(mytable, 2) # B frequencies (summed over columns)

Recoding characters into Numeric Factor 

Risk.profile is a factor variable with very long string definitions: 

table(risk.profile)

You flip a fair coin. If head occurs, you receive $100 and if tail occurs, you receive nothing. 32

You receive $50 for sure. 167

rmg$risk.prone <- 1

rmg$risk.prone[rmg$risk.profile == "You receive $50 for sure."] = 0 rmg$risk.prone <- factor(rmg$risk.prone, labels=c('no','yes'))

table(risk.prone)

2nd Approach

(18)

rmg$risk.3 <- 1

rmg$risk.3[rmg$risk.2 == "You r"] =0

Missing values 

# recode 99 to missing for variable v1

# select rows where v1 is 99 and recode column v1 mydata[v1==99,"v1"] <- NA

Although it is probably easier to use

replace()

:

test <- c(1, 1, 2, 1, 1, 8, 1, 2, 1, 10, 1, 8, 2, 1, 9, 1, 2, 9, 10, 1)

test

test <- replace(test, test == 8 | test == 9 | test == 10, NA)

test <- replace(test, test == 1, 0) test <- replace(test, test == 2, 1)

 

#specify the name and address of the remote file

datafilename="http://personality-project.org/r/datasets/maps.mixx.msq1.epi.bf.txt"

data =read.table(datafilename,header=TRUE) #read the data file msq=data[,2:73] #select the subset of items in the MSQ

msq[msq=="9"] = NA # change all occurences of 9 to be missing values msq <- data.frame(msq) #convert the input matrix into a data frame for

easier manipulation

names(msq) #what are the variables?

summary(msq) #basic summary statistics -- check for miscodings

cleaned=na.omit(msq) #remove the cases with missing values

Factor Analysis

 

f2=factanal(cleaned,2,rotation="varimax") #factor analyze the

resulting item

#(f2) #show the result load=loadings(f2)

print(load,sort=TRUE,digits=2,cutoff=0.01) #show the loadings plot(load) #plot factor 1 by 2

identify(load,labels=names(msq)) #put names of selected points onto the figure

 

Further Factor Analysis 

(19)

?prcomp  summary(total)  library(Hmisc) 

summary(cbind(dwelling.density, employment.with.5.km,  street.density.length.area,  housing.mix,   pedestrian.connectivity, stlanedens, commerce.landuse)) 

summary(cbind(sidewalks,  street.width.ft, pathways.m.1km, greenspace.m2.1km, retail.num.1km,    net.to.sl, greenspace.per)) 

built<‐na.exclude(cbind(sidewalks,  street.width.ft, pathways.m.1km, greenspace.m2.1km,  retail.num.1km,   

net.to.sl, greenspace.per))  pca1<‐prcomp(built,scale=TRUE)  pca1 

summary(pca1)  plot(pca1,main="")  biplot(pca1) 

walk.aug18$green.pca1<‐predict(pca1)[,1]  w4$f3.pca1<‐predict(pca1)[,3] 

names(walk.aug18)   

rcorr(cbind(f1.pca1, f2.pca1, f3.pca1))   

*** Fails to deal with categorical data  w5<‐na.omit(w2) 

pca2<‐prcomp(w5,scale=TRUE)  *** ERROR:  'x' must be numeric   

******       ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐   

*** FACTOR ANALYSIS  ?factanal 

 

walk.sep14<‐na.exclude(walk.aug18)   

fac2<‐factanal(x=built,factors=2,scores="regression",na.action=na.exclude)  fac2 

print(fac2, digits=2, cutoff=.3, sort=TRUE)   

walk.aug18$fac.green<‐fac2$scores[,1]   

walk.aug18$fac.green<‐napredict(na.act,walk.aug18)    ?as.data.frame  ?na.omit    loadings(fac1)[,1]   

(20)

fac.w$f1<‐fac3$scores[,1]   

fit <‐ factanal(fac.w, 2, rotation="varimax")  print(fit, digits=2, cutoff=.3, sort=TRUE)   

Modelling Categorical Dependant Variables 

 

The following models are covered: 

1. Binary Logit  2. Multinomial Logit  3. Conditional Logit  4. Nested Logit  5. Mixed Logit  6. Grouped Logit  7. Probit Models 

Logistic Regression and Exponential Coefficients 

# where F is a binary factor and # x1-x3 are continuous predictors

fit <- glm(F~x1+x2+x3,data=mydata,family=binomial()) summary(fit) # display results

Names(fit)

Conditional Logit 

 

Using package mlogit 

 

load("H:/Research/Projects/Workshop/Logit/R/Hensher.rda") data <- mlogit.data

(h09,choice="mc",shape="long",id.var="id",alt.levels=c("air","train","bus","c ar"))

summary(mod1<-mlogit(mc~invc+invt|hinc,data,reflevel="car")) summary(mod2<-mlogit(mc~invc+invt|hinc,data,reflevel="train"))

 

Forecasts 

fcast <-fitted(mod2) ###Forecasts only the dominant mode

(21)

fcast.w<-fitted(mod2,outcome=F) colMeans(fcast.w)

colMeans(fitted(mod2,outcome=F))

h09b <- subset(h09, subset=mc=="yes")

fcastw<-data.frame(cbind(h09b$alt,fcast.w))

fcastw$mode <- factor(fcastw$V1, labels=c('air','bus','car','train')) mean(fcastw[2:5])

With Zelig 

 

setwd("H:/Research/Projects/Workshop/Logit/R/CLogit")

load("H:/Research/Projects/Workshop/Logit/R/CLogit/h09.rda")

attach(h09)

h09$t = 2 - mc h09$choice<-2-h09$t

h09$alt <- factor(h09$alt, labels=c('air','bus','car','train')) h09$mc <- factor(h09$mc, labels=c('no','yes'))

*** Where mode is 1 if chosen and 0 otherwise.

library(mlogit) library(Zelig) names(h09)

table(mc,alt)

z1 <- zelig(Surv(t,choice) ~ invt+twait+gc+aasc+tasc+ basc+hinca+ strata(id), model = "coxph",data = h09,na.action=na.exclude)

z2 <- zelig(Surv(t,choice) ~ invt+twait+gc+hinc*alt+ strata(id), model = "coxph",data = h09,na.action=na.exclude)

summary(z2)

***this gets the forecasts: 

h09$fexpect<-predict(z2,type="expected")

** Not sure about the following

h09$flp<-predict(z2,type="lp") h09$frisk<-predict(z2,type="risk") h09$fterms<-predict(z2,type="terms")

numSummary(h09[,c("flp", "fexpect")], groups=h09$alt, statistics=c("mean", "sd"), quantiles=c(0,.25,.5,.75,1))

(22)

> numSummary(h09[,"fexpect2"], groups=h09$alt, statistics=c("mean", "sd", "quantiles"), quantiles=c(0,.25,

+ .5,.75,1))

mean sd 0% 25% 50% 75% 100% n air 0.2762 0.2979 0.0009807 0.05426 0.1344 0.4561 0.9833 210 bus 0.1429 0.1989 0.0007001 0.02740 0.0616 0.1801 0.8860 210 car 0.2810 0.2313 0.0080268 0.07236 0.2315 0.4758 0.8549 210 train 0.3000 0.2982 0.0005396 0.06754 0.1879 0.4400 0.9832 210

. tab alt,sum( fcastnew)

| Summary of Pr(alt|1 selected) alt | Mean Std. Dev. Freq. ---+--- air | .27619048 .29786109 210 bus | .14285714 .19891898 210 car | .28095238 .23125647 210 train | .3 .29821735 210 ---+--- Total | .25 .26710362 840

 

Grouped Logit 

Same as glogit and blogit in Stata 

data("womensrole", package = "HSAUR")

summary(mod1 <- glm(cbind(agree, disagree) ~ sex + education, data = womensrole,family = binomial(),trace=T))

or

fm1 <- cbind(agree, disagree) ~ sex + education

womensrole_glm_1 <- glm(fm1, data = womensrole,family = binomial())

Beta reg 

Description Beta regression for modeling beta‐distributed dependent variables, e.g., rates and  proportions.  

 

(23)

performed by maximum likelihood (ML) via optim using analytical gradients and (by default) starting

values from an auxiliary linear regression of the transformed response. 

 

betareg(formula, data, subset, na.action, weights, offset,

link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"), link.phi = NULL, control = betareg.control(...),

model = TRUE, y = TRUE, x = FALSE, ...)

betareg.fit(x, y, z = NULL, weights = NULL, offset = NULL, link = "logit", link.phi = "log", control = betareg.control())

 

Model estimates and coefficients 

confint(fit) # 95% CI for the coefficients exp(coef(fit)) # exponentiated coefficients

exp(confint(fit)) # 95% CI for exponentiated coefficients predict(fit, type="response") # predicted values

residuals(fit, type="deviance") # residuals

library(AER)

coeftest(MLM.1)

T‐Test for NHD  

Create subsets so that the variable includes only two categories. 

Mon <‐ Housing_VPT[Housing_VPT$city=="Montreal", ] 

Mck <‐ Housing_VPT[Housing_VPT$nhd==c("MT","ML"),] 

t.test(total~nhd, alternative='two.sided', conf.level=.95, var.equal=FALSE, data=Mck) 

Data Input 

Reading data 

hs1 <- read.table("http://www.ats.ucla.edu/stat/R/notes/hs1.csv", header=T, sep=",")

attach(hs1)

Keeping certain variables or records 

Keeping only the observations where the reading score is 60 or higher.

hs1.read.well <- hs1[read >= 60, ] 

(24)

these four variables to indicate that we want only these variables in the new data frame called hs1.kept. We use the names function again to verify that hs1.kept consists of only the four variables that we wanted to keep.

names(hs1.read.well)

[1] "female" "id" "race" "ses" "schtyp" "prgtype" "read"

[8] "write" "math" "science" "socst" "prog"

hs1.kept <- hs1.read.well[ , c(1, 2, 7, 8)]

names(hs1.kept)

[1] "female" "id" "read" "write"

Dropping the variables ses and prog from the hs1.read.well data frame by using the column indices corresponding to

these two variables with a negative sign.

names(hs1.read.well)

[1] "female" "id" "race" "ses" "schtyp" "prgtype" "read"

[8] "write" "math" "science" "socst" "prog"

hs1.drop <- hs1.read.well[ , -c(4, 12)]

names(hs1.drop)

[1] "female" "id" "race" 

Merge 

The merge function allows us to merge two data frames on a variable (or a list of variables). In this case the variable in common is id which has the same name in both data sets. Specifying T in the all argument indicates that we want to keep all the observations from each data set rather than only keeping the observations that came from both data sets.

hsdiss <- merge(hstest, hsdem, by="id", all=T)

If the variable that we were merging on had different names in each data frame then we could use the by.x and by.y

arguments. In the by.x argument we would list the name of the variable(s) that was in the data frame listed first in the

merge function (in this case in hstest) and in the by.y argument we would name the variable(s) that was in the data frame listed second (in this case hsdem).

(25)

Creating an indicator of which data set the observations came from is a little more complicated. We would first create an indicator variable called from, in each data frame to be merged. Then we merge the two data sets. Finally, we create a variable both which would indicate which data frame or both the observation came from. It is generally easier to note that when a data frame did not contribute to the observation in the combined data frame then the variables from that data frame will have missing values (NA's) for that observation.

from <- data.frame(rep(1, length(hsdem$id))) dimnames(from)[[2]] <- "from"

hsdem.1 <- cbind(hsdem, from)

from <- data.frame(rep(1, length(hstest$id))) dimnames(from)[[2]] <- "from"

hstest.1 <- cbind(hstest, from)

hsdiss.2 <- merge(hstest.1, hsdem.1, by.x="id", by.y="id", all=T, suffix=c("test", "dem"))

attach(hsdiss.2)

hsdiss.2$both[!is.na(fromtest) & !is.na(fromdem)] <- "both" hsdiss.2$both[is.na(fromtest)] <- "dem"

hsdiss.2$both[is.na(fromdem)] <- "test"

String AS FACTORS

Correlation Analysis 

rcorr(Hmisc)

 

R Documentation

Matrix of Correlations and P‐values 

 

rcorr(cbind(x,y,z,v))

 

 

(26)

 

Panel Data and Robust Standard Errors 

 

###  TEACHING EVALUATION OF PROFESSORS, Jan 24, 2010  

 

lm.1 <- lm(eval ~ beauty + gender + minority + native + tenure + division + credits, data=TeachingRatings)

summary(lm.1)

### Weighted LEAST SQUARES 

 

lm.2 <- lm(eval ~ beauty + gender + minority + native + tenure + division + credits, weights=students,data=TeachingRatings)

summary(lm.2)

   

lm.2 <- lm(eval ~ beauty + gender + minority + native + tenure + division + credits, weights=students,data=TeachingRatings)

summary(lm.2)

###PANEL DATA MODELS 

 

library(plm)   

  ### Convert of variable from factor to integer and sort data   

tr.clean$prof.2 <- with(tr.clean, as.integer(prof)) attach(tr.clean)

tr.sort2 <- tr.clean[order(prof.2),] attach(tr.sort2)

 

  ### Declare data as Panel data   

tr.3<-plm.data(tr.sort2,index=c("prof.2")) attach(tr.3)

 

(27)

cbind(coef(plm.pool),coef(plm.rand),coef(lm.2))

 

### ROBUST STANDARD ERRORS 

 

### The following didn't work.   

summary(lm.2)

coeftest(lm.2, vcov = vcovHC) coeftest(lm.2, vcov = vcovHAC)

###DESIGN LIBRARY   

library(Design) ### Use OLS

lm.3 <- ols(eval ~ beauty + gender + minority + native + tenure + division + credits, weights=students,

data=TeachingRatings,x=TRUE, y=TRUE)

 

### Doesn't work --> summary(lm.3) lm.3

robcov(lm.3, prof.2)

### Other options

robcov(lm.3, prof.2,method="efron")

adjfit<-robcov(lm.6, prof.2) sqrt(diag(adjfit$var))

   

Weighted Arithmetic Mean 

Compute a weighted mean of a numeric vector.

Usage 

weighted.mean(x, w, na.rm = FALSE)

Arguments 

x

 

a numeric vector containing the values whose mean is to be computed.

w

 

a vector of weights the same length as x giving the weights to use for each element of x.

na.rm

 

a logical value indicating whether NA values in x should be stripped before the computation 

proceeds.

 

(28)

Missing values in

w

are not handled.

Examples 

## GPA from Siegel 1994 wt <- c(5, 5, 4, 1)/15 x <- c(3.7,3.3,3.5,2.8) xm <- weighted.mean(x,wt)

 

Weighted Cross tabs using Koppelman Intercity example 

If the weights are available for each observation as a variable(wt): 

.Table <‐ xtabs(wt~altnum+type, data=via.tot, subset=choice==1)  .Table 

colPercents(.Table) # Column Percentages  .Test <‐ chisq.test(.Table, correct=FALSE)  .Test 

 

(29)

 

Estimate Table 

Uses the data stored in the "ccl" object to create a formatted table. The default is LaTeX but since  version 0.5 export to CSV is possible. Therefore it is possible to import the output into a spreadsheet  program and edit it for a wordprocessor.  

Usage 

esttab(t.value = FALSE, p.value = FALSE, round.dec = 3, caption = NULL, label = NULL, sig.levels = c(0.1, 0.05, 0.01), sig.sym=c("*","**","***"), filename=NULL, csv=FALSE, dcolumn=NULL, table="table",

table.pos="htbp", caption.top=FALSE, booktabs=FALSE, var.order=NULL, sub.sections=NULL,var.rename=NULL)

Arguments 

t.value

 

if set to TRUE the table will contain tvalues instead of the default standard errors

p.value

 

if set to TRUE the table will contain pvalues instead of the default standard errors

round.dec

 

number of decimals to round to

caption

 

to be used in the LaTeX output table

label

 

to be used in the LaTeX output table

sig.levels

 

to change the way the stars are calculated. The values must be given as a vector from 

largest to smallest p‐value

 

sig.sym

 

vector of symbols to depict significance levels in TeXtables. Insert the TeX command 

between the "". The vector corresponds to the sig.levels vector. Please note that due to  the cat command backslash needs to be inserted twice in order to appear in the TeX  document. (e.g. "\\alpha")

 

filename

 

determins the filename of the output.Default is NULL, output is printed to screen.

csv

 

for output to csv (comma separated textfile) for direct import to a spreadsheet 

program. The default is TeX‐output.

 

dcolumn

 

a string can be inserted that corresponds to a predefined column type in the TeX

document's head.

 

table

 

a string for choosing a different table type like sideways or tablex.

 

(30)

parameters insert 'NULL'.

caption.top

 

if set to TRUE the caption will be inserted above the table.

booktabs

 

if set to TRUE the \hline commands are replaced by there corresponding booktabs 

commands.

 

var.order

 

by default the order of variables is determined by there appearance in the models. 

Providing a vector of variables here in a different order will change the order of  variables in the output table. Note that '(Intercept)' is enclosed in braces.

 

sub.sections

 

if one needs to subdivide the table in several sections using 'subtitles' this can be done 

here. Providing a vector of the form 

c(linenumber,"subtitle",2ndlinenumber,"2ndsubtitle") and so forth.

 

var.rename

 

vector of names to replace variable abbreviations of model with real names. The vector 

has to looks like this: 

var.rename=c("old.name1","new.name1","old.name2","new.name2") 

 

Example 

names(swm)   

lm.10<‐ lm(ln.waste.g ~ nhd + hhld.members + kids.bin + ln.area + own+  grade12.plus,    data=swm,na.action=na.omit) 

summary(lm.10)   

lm.11<‐ lm(ln.waste.g ~ nhd * hhld.members + kids.bin + ln.area + own+  grade12.plus,    data=swm,na.action=na.omit) 

summary(lm.11)   

estclear() 

eststo(lm.10); eststo(lm.11) 

esttab(t.value=TRUE,round.dec=3,csv=T)   

For Standard errors: 

esttab(t.value=F,round.dec=3,csv=T) 

(31)

 

   

  ln.waste.g  ln.waste.g  

(Intercept)   4.145***  4.261*** 

    [8.136]  [8.046] 

nhd[T.Naseerabad]   0.591***  0.678* 

    [5.003]  [1.796] 

nhd[T.MC]   0.771***  0.667 

    [5.285]  [1.291] 

nhd[T.PIA Colony]   0.236*  ‐0.064 

    [1.949]  [‐0.168] 

nhd[T.Nisar Road]   1.212***  0.96*** 

    [8.454]  [2.75] 

nhd[T.Valley Road]   1.065***  1.339*** 

    [6.912]  [3.698] 

hhld.members   0.142***  0.134*** 

    [8.698]  [3.425] 

kids.bin[T.yes]   ‐0.045  ‐0.042 

    [‐0.604]  [‐0.558] 

ln.area   0.161**  0.152** 

    [2.307]  [2.144] 

own[T.yes]   ‐0.101  ‐0.119 

    [‐1.219]  [‐1.388] 

grade12.plus   0.02  0.014 

    [0.261]  [0.179] 

nhd[T.Naseerabad]:hhld.members     ‐0.01 

      [‐0.191] 

nhd[T.MC]:hhld.members     0.02 

      [0.236] 

nhd[T.PIA Colony]:hhld.members     0.047 

      [0.838] 

nhd[T.Nisar Road]:hhld.members     0.044 

      [0.877] 

nhd[T.Valley Road]:hhld.members     ‐0.043 

      [‐0.834] 

R^2   0.254  0.258 

adj.R^2   0.248  0.248 

N   1092  1092 

t‐values in brackets     

(32)

 

Description Table 

Description 

Uses the data stored in the "dcl" object to create a standard formated table. The default is LaTeX 

(optionally CSV) is possible. Therefore it is possible to import the output into a spreadsheet program and  edit it for a word processor.  

Usage 

desctab(filename=NULL,caption = NULL, label = NULL,csv=FALSE, dcolumn=NULL,booktabs=FALSE)   

descsto(swm) 

References

Related documents

Abstract —This paper presents an overview of selected new modelling algorithms and capabilities in commercial software tools developed by TICRA. A major new area is design and

While creating rythm in the literature is the major function of ’chandassu’, a maatraa chandassu, that is not bound by any ’sequences of gaNaas’ can also be rythmic, which is

The Preferred Alternatives are consistent with this policy because they will not impair valuable coastal waters and resources or reverse vital economic, social, and environmental

Even for those who choose not to incorporate social or economic justice ideals into their clinical program, practice ready can, and should, mean preparing law students to

is a public Big Data of the health care system including the health insurance sample cohorts collected by the National Health Insurance Corp., the patient dataset managed by

■ Enable schools to implement good quality school health programmes that are gender sensitive; that include policies to reduce the risks of HIV infection and

University of New Mexico – School of Medicine Radiologic Sciences Program 505-272-5254. Stevee

Keywords: Cerebrospinal fluid, Dementia with Lewy bodies, Amyloid- β peptides, Co-morbid Alzheimer ’ s disease