• No results found

Dynamically Including Modular Analysis Files

There are a number of reasons why you might want to have your R source code located in separate files from your markup documents even if you compile them together with knitr.

First, it can be unwieldy to edit both your markup and long R source code chunks in the same document, even with RStudio’s handy knitr code folding and chunk management options. There are just too many things going on in one document.

Second, you may want to use the same code in multiple documents–an article and slide show presentation, for example. It is nice to not have to copy and paste the same code into multiple places. Instead it is easier to have multiple documents link to the same source code file. When you make changes to this source code file, the changes will automatically be made across all of your presentation documents. You don’t need to make the same changes multiple times.

Third, other researchers trying to replicate your work might only be inter- ested in specific parts of your analysis. If you have the analysis broken into separate and clearly labeled modular files that are explicitly tied together in the markup file with knitr, it is easy for them to find the specific bits of code that they are interested in.

8.2.1 Source from a local file

Usually in the early stages of your research you may want to run code stored in analysis files located on your computer. Doing this is simple. The knitr syntax is the same as for block code chunks. The only change is that instead of writing all of your code in the chunk, you save it to its own file and use the

source command to access it.8For example, in an R Markdown file we could

run the R code in a file called MainAnalysis.R from our ExampleProject like this:

{r, include=FALSE}

# Run main analysis

source("/ExampleProject/Analysis/MainAnalysis.R"}

Notice that we set the option include=FALSE. This will run the analysis and produce objects created by the analysis code that can be used by other code chunks, but the output will not show up in the presentation document’s text.

Statistical Modelling and knitr 153

Sourcing a makefile in a code chunk

In Chapter 6 we created a GNU Makefile to organize our data gathering. You can run makefiles every time you compile your presentation document. This can keep your data, analyses, figures, and tables up-to-date. One way to do this is to run the GNU makefile in an R code chunk with the system command (see Section 4.5). Perhaps a better way to run makefiles from knitr presentation documents is to include the commands in a code chunk using the Bash engine. For example, a Sweave-style code chunk for running the makefiles in our example project would look like this:

<<engine= bash , include=FALSE>>=

# Change working directory to /ExampleProject/Data/GatherSource cd /ExampleProject/Data/GatherSource/

# Run makefile make cleanMerge all

# Change to working directory to /ExampleProject/Analysis/ cd /ExampleProject/Analysis/

@

Please see page 108 for details on the make command arguments used here. You can of course also use R’s source command to run an R make-like data gathering file. Unlike GNU Make, this will rerun all of the data gathering files, even if they have not been updated. This may become very time consuming depending on the size of your data sets and how they are manipulated.

One final note on including makefiles in your knitr presentation document source code: it is important to place the code chunk with the makefile before code chunks containing statistical analyses that depend on the data file it creates. Placing the makefile first will keep the others up-to-date.

8.2.2 Source from a non-secure URL (http)

Sourcing from your computer is fine if you are working alone and do not want others to access your code. Once you start collaborating and generally wanting people to be able to reproduce your analyses, you need to use another storage method. The simplest method is to host the replication code in your Dropbox public folder. You can find the file’s public URL in the same way that you did

in Chapter 5. Then use the source command the same way as we did before

with the read.table command.9

Let’s look at a quick example of sourcing an R function that has been made available at a non-secure URL. Paul Johnson created a function called outreg for creating LaTeX tables from R objects. He has made the function avail- able on his website at: http://pj.freefaculty.org/R/WorkingExamples/ outreg-worked.R. You can directly load this function into your workspace with source.

# Load Paul Johnson outreg function

source("http://pj.freefaculty.org/R/WorkingExamples/outreg-worked.R")

Now you can use outreg like any other function you have loaded.

8.2.3 Source from a secure URL (https)

If you are using GitHub or another service that uses secure URLs to host your analysis source code files you need to use the source_url command in the

devtools package. For GitHub based source code we find the file’s URL the

same way we did in Chapter 6 (Section 5.3.4). Remember to use the URL for the raw version of the file. I have a short script hosted on GitHub for creating a scatterplot from data in R’s cars data set. The script’s shortened URL is

http://bit.ly/UOtH4L.10To run this code and create the scatterplot using

source_url you simply type:

# Load devtools package

library(devtools)

# Run the source code to create the scatter plot

source_url("http://bit.ly/UOtH4L")

## SHA-1 hash of file is ff75a88b90decfcaefc9903bbc283e1fc4cd2339

9You can also make the replication code accessible for download and either instruct

others to change the working directory to the replication file or have them change the directory information as necessary. You will need to do this with GNU makefiles like those included included with this book.

10The original URL is at https://raw.github.com/christophergandrud/

Rep-Res-Examples/master/Graphs/SimpleScatter.R. This is very long, so I short- ened it using bitly (see http://bitly.com). You may notice that the shortened URL is not secure. However, it does link to the original secure https URL.

Statistical Modelling and knitr 155 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 0 20 40 60 80 100 120 Speed (mph) Stopping Distance (ft)

You can also use the devtools command source_gist in a similar way to source GitHub Gists. Gists are a handy way to share code over the internet. For more details see: https://gist.github.com/.

Similar to what we saw in Chapter 5 (Section 5.3.4), if you would like to use a particular version of a file stored on GitHub simply include that version’s URL in the source_url call. This can be useful for replicating par- ticular results. Linking to a particular version of a source code file will enable replication even if you later make changes to the file. To access the URL for a particular version of a file first click on the file on GitHub’s website. Then

click the History button ( ). This will take you to a page listing all of

the file’s versions. Click on the Browse Code button ( ) next to the

version of the file that you want to use. Finally, click on the Raw button to be taken to the text-only version of the file. Copy this page’s URL and use it in source_url.

Also, just like with source_data, we can set the sha1 argument to tell source_url to make sure that the source code file it is downloading is the one we intended. This will work regardless of whether or not the file is stored on GitHub.