Stata Tutorial
Econometrics676, Spring20081
Introduction
1.1
Stata Windows and toolbar
Review : past commands appear here Variables : variables list
Command : where you type your commands Results : results are displayed here
1.1.1 Toolbar
log: start / stop a log …le data editor viewer: open viewer window data browser
results: open results window more: continue when paused in long output graph: open graph window break: stop the current task
1.2
Stata File management
1.2.1 FILES EXTENTIONS
Data …le …lename.dta
Do …le …lename.do (program …le) Log …le …lename.scml
…lename.log
(only readable in stata) (text …le)
1.2.2 WORKING DIRECTORY
The working directory displayed at the bottom left hand corner of the window.
Tochange your workingdirectory, use thecd command Example cd “c:n”
dir To open a …le
use …lename, clear
use varlist using …lename, clear for a subset of the data …le To save a data…le
save, replace overwrites current …le
save …lename, replace saves …le as …lename. Replace is optional but necessary if a …le of that name already exists Example save "c:nexample1.dta", replace
1.2.3 MEMORY
STATA opens with a default memory of 1m.
In some cases you may get the message ‘no room to add more observations’ This is because not enough memory has been assigned to STATA. To change the memory assigned to STATA:set mem #m
Example set mem 16m
1.2.4 LOG FILES
All output appearing in the Results window can be can be captured in a log …le.
The log …le can be saved as a STATA formatted (SMCL) or as a text …le. By default, logs are written in SMCL (Stata Markup and Control Lan-guage).
To translate a log …le created in smcl to text, go tonFilenLognTranslate To start a log
log using …lename starts an smcl log log using …lename, replace overwrites …lename.smcl log using …lename.log starts a text log
To pause and resume a log
log o¤ temporarily suspends log …le
log on resumes log …le
These commands can be useful to create a log that contains only results and not intermediate programming.
To close a log
log close closes current log …le
You can add comments to your log as you work by entering any comments in the command line (or in your do-…le) preceded by a *
Example *unemployment rate
1.2.5 CONTROLLING OUTPUT
-more- may appear in your results window when you are trying to output a long listing To see the next line: press Enter
To see the next screen: press any key or click on the –more-To interrupt a STATA command at any time uses the Break button
2
Manupulating Data
2.1
Destrictive Commands
2.1.1 Describe
There are various ways of examining a dataset in Stata, includingdescribe, list, and summarise.
produces a summary of the contents of a dataset
d describes dataset in current memory
d using …lename describes a stored STATA format dataset
2.1.2 Summarise
summarize calculates and displays a variety of univariate statistics
su summarise whole dataset
su varlist summarise subset varlist
su varlist, d summarise with the detail command
2.1.3 List
most detailed of the commonly used descriptive commands.
L displays the values of variables
2.1.4 Graph
twoway plot_type varl1 var2 draws scatterplots, line plots, etc Plot type: Scatter, Line, Connected
Example twoway scatter lincome lincomea, t1("Graph 1")
2.1.5 SORT and BY Commands
sort varlist Arranges the observations of the current data into ascending order of the values of the variables of varlist by varlist : causes the command that follows to be repeated for
each unique set of values of the variables in varlist
Example sort region
Example 1 by region: su income
(Note) Data must be sorted by varlist, before you use ‘by’command.
2.1.6 Cross Tabulation
tabulate produces one- and two-way tables of frequency counts tab var1 var2 [weight] [if exp]
2.2
Creating New Variables
Generatecan create a new variable that is an algebraic expression of other variables.
To change the contents of an existing variable you must use the replace command.
replace oldvar = expression
Example gen agerange = . if age<16
replace agerange=1 if 16<=age & age<25 replace agerange=2 if 25<= age & age<45
Example gen age16=0
replace age16=1 if age==16
Values for a string variable are denoted by inverted commas “” Example gen age=”young” if agerange==1
replace age=”” if agerange~=1
The default code for a missing value in STATA is a single period (.) or a blank “” in the case of a string.
Example replace var = . if var == 99
replace string = “” if string == “not answered”
3
Linear Regression
regress y X [, noconstant robust] estimates a linear regressiony=c+X +
predict [type] newvar calculates predictions, residuals and statistics after estimation
predict_type xb res stdp
Linear prediction (default) residual
standard error of prediction Example reg lwage exper age kid
reglwage exper age kids,robust (Robust Estimation) reglwage exper age kids,nocons (No constant term)
3.1
Ivreg
Instrumental variables (two-stage least-squares) regression
Syntax ivreg depvar [varlist1] (varlist2 = varlist_iv) [if] [in] [weight] [, options]
Example ivreg y1 x1 x2 (y2 y3 = z1 z2 z3) ivreg y1 x1 x2 (y2 = z1 z2 z3) x3 test x3=5
test y2=x3-x2 predict y1hat
3.2
Hypothesis Testing
ttest var t tests test varlist F tests
Example ttest income=180, level(99)
ttest group1_income=group2_income ttest income, by(male)
test exper age kids
4
Stata Resources
Stata Textbook ExamplesEconometric Analysis of Cross Section and Panel Data by Je¤ rey M. Wooldridge
http:==www:ats:ucla:edu=stat=stata=examples=eacspd=
UCLA Stata Starter Kit