5.4 RStudio & GitHub
6.1.2.1 Example makefile
The first thing we need to do is create a new file called Makefile6and place it in the same directory as the data gathering files we already have. The makefile we are going to create runs prerequisite files by the alphanumeric order of their file names. So we need to make sure that the files are named in the order that we want to run them. Now let’s look at the actual makefile:
################ # Example Makefile # Christopher Gandrud # Updated 1 July 2013
# Influenced by Rob Hyndman (31 October 2012)
# See: http://robjhyndman.com/researchtips/makefiles/
5A file’s time stamp records the time and date when it was last changed.
Gathering Data with R 105
################
# Key variables to define RDIR = .
MERGE_OUT = MergeData.Rout # Create list of R source files RSOURCE = $(wildcard $(RDIR)/*.R)
# Files to indicate when the RSOURCE file was run OUT_FILES = $(RSOURCE:.R=.Rout)
# Default target all: $(OUT_FILES) # Run the RSOURCE files $(RDIR)/%.Rout: $(RDIR)/%.R
R CMD BATCH $< # Remove Out Files clean:
rm -fv $(OUT_FILES) # Remove MergeData.Rout cleanMerge:
rm -fv $(MERGE_OUT)
Ok, let’s break down the code. The first part of the file defines variables that will be used later on. For example, in the first line of executable code (RDIR = .) we create a simple variable7called ROUT with a period (.) as its value. In Make and Unix-like shells, periods indicate the current directory. The next line allows us to specify a variable for the outfile created by running the MergeData.R file. This will be useful later when we create a target for removing this file to ensure that the MergeData.R file is always run.
The third executed line (RSOURCE:= $(wildcard $(RDIR)/*.R)) creates a variable containing a list of all of the names of files with the extension .R, i.e. our data gathering and merge source code files. This line has some new syntax, so let’s work through it. In Make (and Unix-like shells generally) a dollar sign ($) followed by a variable name substitutes the value of the variable in place of
7Simple string variables are often referred to as “macros” in GNU Make. A common
the name.8For example, $(RDIR) inserts the period . that we defined as the value of RDIR previously. The parentheses are included to clearly demarcate where the variable name begins and ends.9
You may remember the asterisk (*) from the previous chapter. It is a “wildcard”, a special character that allows you to select file names that follow a particular pattern. Using *.R selects any file name that ends in .R.
Why did we also include the actual word wildcard? The wildcard func- tion is different from the asterisk wildcard character. The function creates a list of files that match a pattern. In this case the pattern is $(RDIR)/*.R. The general form for writing the wildcard function is: $(wildcard PATTERN).
The third line (OUT_FILES = $(RSOURCE:.R=.Rout)) creates a variable for the .Rout files that Make will use to tell how recently each R file was
run.10 $(RSOURCE:.R=.Rout) is a variable that uses the same file name as
our RSOURCE files, but with the file extension .Rout.
The second part of the makefile tells Make what we want to create and how to create it. In the line all: $(OUT_FILES) we are specifying the makefile’s default target. Targets are the files that you instruct Make to make. all: sets the default target; it is what Make tries to create when you enter the command make in the shell with no arguments. We will see later how to instruct Make to compile different targets.
The next two executable lines ($(RDIR)/%.Rout: $(RDIR)/%.R and R CMD BATCH $<) run the R source code files in the directory. The first line specifies that the .Rout files are the targets of the .R files. The percent sign (%) is another wildcard. Unlike the asterisk, it replaces the selected file names throughout the command used to create the target.
The dollar and less than signs ($<) indicate the first prerequisite for the target, i.e. the .R files. R CMD BATCH is a way to call R from a Unix-like shell, run source files, and output the results to other files.11The outfiles it creates have the extension .Rout.
The next two lines specify another target: clean. When you type make clean into your shell Make will follow the recipe: rm -fv $(OUT_FILES). This removes (deletes) the .Rout files. The f argument (force) ignores files that don’t exist and the v argument (verbose) instructs Make to tell you what is happening when it runs. When you delete the .Rout files, Make will run all of the .R files the next time you call it.
The last two lines help us solve a problem created by the fact that our
8This is a kind of parameter expansion. For more information about parameter expansion
see Frazier (2008).
9Braces ({}) are also sometimes used for this.
10The R outfile contains all of the output from the R session used while running the file.
These can be a helpful place to look for errors if your makefiles gives you an error like make: *** [Gather.Rout] Error 1.
11You will need to make sure that R is in your PATH. Setting this up is different on
different systems. If on Mac and Linux you can load R from the Terminal by typing R, R is in your PATH. The usual R installation usually sets this up correctly. There are different methods for changing the file path on different versions of Windows.
Gathering Data with R 107 simple makefile doesn’t push changes downstream. For example, if we make a change to Gather2.R and run make, only Gather2.R will be rerun. The new data frame will not be added to the final merged data set. To overcome this problem the last two lines of code create a target called cleanMerge, this removes only the MergeData.Rout file.
Running the MakeFile
To run the makefile for the first time simply change the working directory to where the file is and type make into your shell. It will create the CSV final data file and four files with the extension .Rout, indicating when the segmented data gathering files were last run.12
When you run make in the shell for the first time you should get the output:
## R CMD BATCH Gather1.R ## R CMD BATCH Gather2.R ## R CMD BATCH Gather3.R ## R CMD BATCH MergeData.R
If you run it a second time without changing the R source files you will get the following output:
## make: Nothing to be done for all .
To remove all of the .Rout files set the make target to clean:
make clean
## rm -fv ./Gather1.Rout ./Gather2.Rout ./Gather3.Rout ## ./MergeData.Rout
## ./Gather1.Rout ## ./Gather2.Rout ## ./Gather3.Rout ## ./MergeData.Rout
12If you open these files you fill find the output from the R session used when the their
FIGURE 6.1
The RStudio Build Tab
If we run the following code:
# Remove MergeData.Rout and make all R source files make cleanMerge all
then Make will first remove the MergeData.Rout file (if there is one) and then run all of the R source files as need be. MergeData.R will always be run. This ensures that changes to the gathered data frames are updated in the final merged data set.