Debugging tools - Fix it and test it - Debugging, condition handling, and defensive programming

Debugging, condition handling, and defensive programming

4. Fix it and test it

9.2 Debugging tools

To implement a strategy of debugging, you’ll need tools. In this sec-tion, you’ll learn about the tools provided by R and the RStudio IDE.

RStudio’s integrated debugging support makes life easier by exposing existing R tools in a user friendly way. I’ll show you both the R and RStudio ways so that you can work with whatever environment you use.

You may also want to refer to the oﬃcial RStudio debugging documen-tation (http://www.rstudio.com/ide/docs/debugging/overview) which al-ways reﬂects the tools in the latest version of RStudio.

There are three key debugging tools:

• RStudio’s error inspector and traceback() which list the sequence of calls that lead to the error.

• RStudio’s “Rerun with Debug” tool and options(error = browser) which open an interactive session where the error occurred.

• RStudio’s breakpoints and browser() which open an interactive session at an arbitrary location in the code.

I’ll explain each tool in more detail below.

You shouldn’t need to use these tools when writing new functions. If you ﬁnd yourself using them frequently with new code, you may want

154 Advanced R to reconsider your approach. Instead of trying to write one big function all at once, work interactively on small pieces. If you start small, you can quickly identify why something doesn’t work. But if you start large, you may end up struggling to identify the source of the problem.

9.2.1 Determining the sequence of calls

The ﬁrst tool is the call stack, the sequence of calls that lead up to an error. Here’s a simple example: you can see that f() calls g() calls h() calls i() which adds together a number and a string creating a error:

f <- function(a) g(a) g <- function(b) h(b) h <- function(c) i(c) i <- function(d) "a" + d f(10)

When we run this code in Rstudio we see:

Two options appear to the right of the error message: “Show Traceback”

and “Rerun with Debug”. If you click “Show traceback” you see:

If you’re not using Rstudio, you can use traceback() to get the same information:

traceback()

# 4: i(c) at exceptions-example.R#3

# 3: h(b) at exceptions-example.R#2

# 2: g(a) at exceptions-example.R#1

# 1: f(10)

Debugging, condition handling, and defensive programming 155 Read the call stack from bottom to top: the initial call is f(), which calls g(), then h(), then i(), which triggers the error. If you’re calling code that you source()d into R, the traceback will also display the location of the function, in the form filename.r#linenumber. These are clickable in Rstudio, and will take you to the corresponding line of code in the editor.

Sometimes this is enough information to let you track down the error and ﬁx it. However, it’s usually not. traceback() shows you where the error occurred, but not why. The next useful tool is the interactive debug-ger, which allows you to pause execution of a function and interactively explore its state.

9.2.2 Browsing on error

The easiest way to enter the interactive debugger is through RStudio’s

“Rerun with Debug” tool. This reruns the command that created the error, pausing execution where the error occurred. You’re now in an interactive state inside the function, and you can interact with any object deﬁned there. You’ll see the corresponding code in the editor (with the statement that will be run next highlighted), objects in the current environment in the “Environment” pane, the call stack in a “Traceback”

pane, and you can run arbitrary R code in the console.

As well as any regular R function, there are a few special commands you can use in debug mode. You can access them either with the

Rstu-dio toolbar ( ) or with the

key-board:

• Next, n: executes the next step in the function. Be careful if you have a variable named n; to print it you’ll need to do print(n).

• Step into, or s: works like next, but if the next step is a function, it will step into that function so you can work through each line.

• Finish, or f: ﬁnishes execution of the current loop or function.

• Continue, c: leaves interactive debugging and continues regular exe-cution of the function. This is useful if you’ve ﬁxed the bad state and want to check that the function proceeds correctly.

• Stop, Q: stops debugging, terminates the function, and returns to the global workspace. Use this once you’ve ﬁgured out where the problem is, and you’re ready to ﬁx it and reload the code.

156 Advanced R There are two other slightly less useful commands that aren’t available in the toolbar:

• Enter: repeats the previous command. I ﬁnd this too easy to activate accidentally, so I turn it oﬀ using options(browserNLdisabled = TRUE).

• where: prints stack trace of active calls (the interactive equivalent of traceback).

To enter this style of debugging outside of RStudio, you can use the erroroption which speciﬁes a function to run when an error occurs. The function most similar to Rstudio’s debug is browser(): this will start an interactive console in the environment where the error occurred. Use options(error = browser) to turn it on, re-run the previous command, then use options(error = NULL) to return to the default error behaviour.

You could automate this with the browseOnce() function as deﬁned be-low:

browseOnce <- function() { old <- getOption("error") function() {

options(error = old) browser()

} }

options(error = browseOnce()) f <- function() stop("!")

# Enters browser f()

# Runs normally f()

(You’ll learn more about functions that return functions in Chapter 10.) There are two other useful functions that you can use with the error option:

• recover is a step up from browser, as it allows you to enter the envi-ronment of any of the calls in the call stack. This is useful because often the root cause of the error is a number of calls back.

Debugging, condition handling, and defensive programming 157

• dump.frames is an equivalent to recover for non-interactive code. It creates a last.dump.rda ﬁle in the current working directory. Then, in a later interactive R session, you load that ﬁle, and use debugger() to enter an interactive debugger with the same interface as recover().

This allows interactive debugging of batch code.

# In batch R process ----dump_and_quit <- function() {

# Save debugging info to file last.dump.rda dump.frames(to.file = TRUE)

# Quit R with error status q(status = 1)

}

options(error = dump_and_quit)

# In a later interactive session ----load("last.dump.rda")

debugger()

To reset error behaviour to the default, use options(error = NULL). Then errors will print a message and abort function execution.

9.2.3 Browsing arbitrary code

As well as entering an interactive console on error, you can enter it at an arbitrary code location by using either an Rstudio breakpoint or browser(). You can set a breakpoint in Rstudio by clicking to the left of the line number, or pressing Shift + F9. Equivalently, add browser() where you want execution to pause. Breakpoints behave similarly to browser()but they are easier to set (one click instead of nine key presses), and you don’t run the risk of accidentally including a browser() state-ment in your source code. There are two small downsides to breakpoints:

• There are a few unusual situations in which breakpoints will not work:

read breakpoint troubleshooting (http://www.rstudio.com/ide/docs/

debugging/breakpoint-troubleshooting) for more details.

• RStudio currently does not support conditional breakpoints, whereas you can always put browser() inside an if statement.

As well as adding browser() yourself, there are two other functions that will add it to code:

158 Advanced R

• debug() inserts a browser statement in the ﬁrst line of the speciﬁed function. undebug() removes it. Alternatively, you can use debugonce() to browse only on the next run.

• utils::setBreakpoint() works similarly, but instead of taking a func-tion name, it takes a ﬁle name and line number and ﬁnds the appro-priate function for you.

These two functions are both special cases of trace(), which inserts arbi-trary code at any position in an existing function. trace() is occasionally useful when you’re debugging code that you don’t have the source for.

To remove tracing from a function, use untrace(). You can only perform one trace per function, but that one trace can call multiple functions.

9.2.4 The call stack: traceback(), where, and recover() Unfortunately the call stacks printed by traceback(), browser() + where, and recover() are not consistent. The following table shows how the call stacks from a simple nested set of calls are displayed by the three tools.

traceback() where recover()

4: stop("Error") where 1: stop("Error") 1: f()

3: h(x) where 2: h(x) 2: g(x)

2: g(x) where 3: g(x) 3: h(x)

1: f() where 4: f()

Note that numbering is diﬀerent between traceback() and where, and that recover() displays calls in the opposite order, and omits the call to stop(). RStudio displays calls in the same order as traceback() but omits the numbers.

9.2.5 Other types of failure

There are other ways for a function to fail apart from throwing an error or returning an incorrect result.

• A function may generate an unexpected warning. The easiest way to

Debugging, condition handling, and defensive programming 159 track down warnings is to convert them into errors with options(warn

= 2) and use the regular debugging tools. When you do this you’ll see some extra calls in the call stack, like doWithOneRestart(), withOneRestart(), withRestarts(), and .signalSimpleWarning().

Ignore these: they are internal functions used to turn warnings into errors.

• A function may generate an unexpected message. There’s no built-in tool to help solve this problem, but it’s possible to create one:

message2error <- function(code) {

withCallingHandlers(code, message = function(e) stop(e)) }

f <- function() g()

g <- function() message("Hi!") g()

# Error in message("Hi!"): Hi!

message2error(g()) traceback()

# 10: stop(e) at #2

# 9: (function (e) stop(e))(list(message = "Hi!\n",

# call = message("Hi!")))

# 8: signalCondition(cond)

# 7: doWithOneRestart(return(expr), restart)

# 6: withOneRestart(expr, restarts[[1L]])

# 5: withRestarts()

# 4: message("Hi!") at #1

# 3: g()

# 2: withCallingHandlers(code, message = function(e) stop(e))

# at #2

# 1: message2error(g())

As with warnings, you’ll need to ignore some of the calls on the trace-back (i.e., the ﬁrst two and the last seven).

• A function might never return. This is particularly hard to debug automatically, but sometimes terminating the function and looking at the call stack is informative. Otherwise, use the basic debugging strategies described above.

• The worst scenario is that your code might crash R completely, leaving you with no way to interactively debug your code. This indicates a bug in underlying C code. This is hard to debug. Sometimes an interactive

160 Advanced R debugger, like gdb, can be useful, but describing how to use it is beyond the scope of this book.

If the crash is caused by base R code, post a reproducible example to R-help. If it’s in a package, contact the package maintainer. If it’s your own C or C++ code, you’ll need to use numerous print() statements to narrow down the location of the bug, and then you’ll need to use many more print statements to ﬁgure out which data structure doesn’t have the properties that you expect.

In document Advanced R (Page 169-176)