• No results found

Reproducible Research Using Stata

N/A
N/A
Protected

Academic year: 2021

Share "Reproducible Research Using Stata"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Reproducible Research Using Stata

L. Philip Schumm Ronald A. Thisted

Department of Health Studies University of Chicago

(2)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

do-file paper/report

data analysis writing

cut & paste re-enter by hand

I Inefficient and time-consuming

(3)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

do-file paper/report

data analysis writing

cut & paste re-enter by hand

I Inefficient and time-consuming

(4)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

do-file paper/report

data analysis writing

cut & paste re-enter by hand

I Inefficient and time-consuming

(5)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples A big improvement: Intermediary files

do-file paper/

report

data analysis writing

graphs & tables individual

results

I Not all results automatically transferred

I Can be difficult to manage

(6)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples A big improvement: Intermediary files

do-file paper/

report

data analysis writing

graphs & tables individual

results

I Not all results automatically transferred

I Can be difficult to manage

(7)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples A big improvement: Intermediary files

do-file paper/

report

data analysis writing

graphs & tables individual

results

I Not all results automatically transferred

I Can be difficult to manage

(8)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples A big improvement: Intermediary files

do-file paper/

report

data analysis writing

graphs & tables individual

results

I Not all results automatically transferred

I Can be difficult to manage

(9)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Reproducible research

What is reproducible research?

I Emerging literature (e.g., Buckheit and Donoho, 1995;

Gentleman and Lang, 2003)

I Dynamic document composed of code chunks and text chunks

I Literate programming (Knuth, 1992)

I tangling

I weaving

(10)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Reproducible research

What is reproducible research?

I Emerging literature (e.g., Buckheit and Donoho, 1995;

Gentleman and Lang, 2003)

I Dynamic document composed of code chunks and text chunks

I Literate programming (Knuth, 1992)

I tangling

I weaving

(11)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Reproducible research

What is reproducible research?

I Emerging literature (e.g., Buckheit and Donoho, 1995;

Gentleman and Lang, 2003)

I Dynamic document composed of code chunks and text chunks

I Literate programming (Knuth, 1992)

I tangling

I weaving

(12)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Reproducible research

What is reproducible research?

I Emerging literature (e.g., Buckheit and Donoho, 1995;

Gentleman and Lang, 2003)

I Dynamic document composed of code chunks and text chunks

I Literate programming (Knuth, 1992)

I tangling

I weaving

(13)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Dynamic do-files

A “dynamic” do-file

do-file data analysis stata2doc.py writing

(14)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Dynamic do-files

Comments, commands, and docstrings

// Here is an example dynamic do-file.

* here is the docstring for the first command sysuse auto

* weightsq equals weight squared gen weightsq=weight^2

reg mpg weight weightsq foreign

/* As you can see, commands don’t have to have docstrings. */

(15)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples -stata2doc- and

-s2d-Two stata commands: stata2doc and s2d

do-file data analysis stata2doc.py writing stata2doc.ado log file graphs scalars

(16)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples -stata2doc- and

-s2d-Syntax

stata2doc using do-file, [dirname(dirname) linesize(#) as(type) replace override options]

(17)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples -stata2doc- and

-s2d-Examples of -s2d- usage

. s2d w2coef=_b[weightsq] rsq=e(r2): reg mpg weight weightsq foreign <output omitted> . scalar li s2d_rsq = .69129599 s2d_w2coef = 1.591e-06 . s2d two = (1 + 1), noi s2d_two = 2

(18)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Final output

Putting it all together

do-file

data analysis

stata2rst.py documentreST

writing

stata2doc.ado

log file graphs scalars

(19)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

What is reStructuredText?

I A plaintext markup syntax and parser system

I Intuitive, readable, and easy-to-use

I Powerful and extensible

I via Docutils may be translated into a variety of formats (e.g.,

HTML, LATEX, PDF, Open Office)

(20)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

What is reStructuredText?

I A plaintext markup syntax and parser system

I Intuitive, readable, and easy-to-use

I Powerful and extensible

I via Docutils may be translated into a variety of formats (e.g.,

HTML, LATEX, PDF, Open Office)

(21)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

What is reStructuredText?

I A plaintext markup syntax and parser system

I Intuitive, readable, and easy-to-use

I Powerful and extensible

I via Docutils may be translated into a variety of formats (e.g.,

HTML, LATEX, PDF, Open Office)

(22)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

What is reStructuredText?

I A plaintext markup syntax and parser system

I Intuitive, readable, and easy-to-use

I Powerful and extensible

I via Docutils may be translated into a variety of formats (e.g.,

HTML, LATEX, PDF, Open Office)

(23)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Simple command

Simple command: do-file

/*

---A Simple Example

---This is a *very* simple example in which I shall demonstrate the following: 1) a simple command

2) graphs 3) substitution 4) tables

The Venerable Auto Data ---Let’s start by reading them in: */

(24)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Simple command

Simple command: reStructuredText

---A Simple Example

---This is a *very* simple example in which I shall demonstrate the following:

1) a simple command 2) graphs 3) substitution 4) tables

The Venerable Auto Data

---Let’s start by reading them in:

::

. sysuse auto (1978 Automobile Data)

(25)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Simple command

Simple command: PDF via L

A

TEX

A Simple Example

This is a very simple example in which I shall demonstrate the following: 1) a simple command

2) graphs 3) substitution 4) tables

The Venerable Auto Data

Let’s start by reading them in: . sysuse auto (1978 Automobile Data)

(26)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Graphs

Graph: do-file

* Now lets look at a boxplot comparing mpg between * domestic and foreign.

* Boxplot comparing domestic and foreign. gr box mpg, over(foreign) name(fig1)

(27)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Graphs

Graph: reStructuredText

Now lets look at a boxplot comparing mpg between domestic and foreign.

.. gr box mpg, over(foreign) name(fig1) .. figure:: fig1.pdf

:scale: 33

(28)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Graphs

Graph: PDF via L

A

TEX

Now lets look at a boxplot comparing mpg between domestic and foreign.

10 20 30 40 Mileage (mpg) Domestic Foreign

Figure 1: Boxplot comparing domestic and foreign.

(29)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Substitutions

Substitution: do-file

* Using a t-test to compare mpg between foreign and domestic * cars yields a p-value of |s2d_ttp|.

(30)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Substitutions

Substitution: reStructuredText

Using a t-test to compare mpg between foreign and domestic cars yields a p-value of |s2d_ttp|.

.. s2d ttp=(string(r(p),"%05.4f")): ttest mpg, by(foreign) .. |s2d_ttp| replace:: 0.0005

(31)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Substitutions

Substitution: PDF via L

A

TEX

Using a t-test to compare mpg between foreign and domestic cars yields a p-value of 0.0005.

(32)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Tables

Table: do-file

* Finally, we’ll try regressing ‘‘mpg‘‘ on ‘‘weight‘‘, ‘‘weightsq‘‘, * and ‘‘foreign‘‘.

* Regression of mpg on several covariates. s2d, t: reg mpg weight weightsq foreign

(33)

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Tables

Table: reStructuredText

Finally, we’ll try regressing ‘‘mpg‘‘ on ‘‘weight‘‘, ‘‘weightsq‘‘, and ‘‘foreign‘‘.

.. s2d, t: reg mpg weight weightsq foreign

.. stata-table:: Regression of mpg on several covariates.

---mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---+---weight | -.0165729 .0039692 -4.18 0.000 -.0244892 -.0086567 weightsq | 1.59e-06 6.25e-07 2.55 0.013 3.45e-07 2.84e-06 foreign | -2.2035 1.059246 -2.08 0.041 -4.3161 -.0909002 _cons | 56.53884 6.197383 9.12 0.000 44.17855 68.89913

(34)

---Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Tables

Table: PDF via L

A

TEX

Finally, we’ll try regressing mpg on weight, weightsq, and foreign. Table 1: Regression of mpg on several covariates.

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

weight -.0165729 .0039692 -4.18 0.000 -.0244892 -.0086567

weightsq 1.59e-06 6.25e-07 2.55 0.013 3.45e-07 2.84e-06

foreign -2.2035 1.059246 -2.08 0.041 -4.3161 -.0909002

cons 56.53884 6.197383 9.12 0.000 44.17855 68.89913

References

Related documents

Survey, Years, Files Restricted Variables: Non-NCHS Data: Merge Variables F. Output: Overview: Examples/Table Shells Presentation of Results

In addition to the file formats, the major differences between the master files maintained by the NCCS and IRS are: (1) the NCCS categorizes the files according to whether

The program displays the prompt &#34;What (file of filenames) output file (* TERM *)? Using Sequences Finding and Copying Database Sequence Files 2-15.. 3) Press &lt;Return&gt;

Examples of standard Windows resources that can be managed using WMI include computer systems, disks, peripheral devices, event logs, files, folders, file sys- tems,

SSH Login nodes Job file Cluster LSF (batch manager) Output Files Queue VPN + SSH Internet VPN Campus Network On-campus: Off-campus: Submit job Create job.. Sep 13,

➡ Write into HDFS only successfully produced ROOT files avoiding useless load on the FS. ➡ Write 2 times the

The cfflib supports (a) reading and writing of the connectome metadata markup (meta.cml) and compressed connectome files, (b) basic Input/Output of the sup- ported file formats

If you must start from other users must quote characters as filenames, it names on output not find command in unix with examples pdf files or more commands pdf file.. The