Can interpreting be as fast as byte compiling?
+ Other developments in pqR
Radford M. Neal, University of Toronto
Dept. of Statistical Sciences and Dept. of Computer Science http://www.cs.utoronto.ca/∼radford
http://radfordneal.wordpress.com http://pqR-project.org
R Summit Conference, Copenhagen, June 2015
Why interpret rather than byte compile in pqR?
Some pqR optimizations are only implemented in interpreted code, mainly those using the “variant result” mechanism. Some examples:
• Automatic parallelization with helper threads — eg, exp(colSums(M)).
• Merging of operations — eg, u <- 2*vec^2+1.
• Short-cut evaluations – eg, fast any(x>0) if early element is positive.
• Avoidance of sequence creation — eg, for (i in 1:n) or vec[i:j].
These optimizations tend to help more with vector and matrix operations, but not as much with operations on scalars. This seems a natural focus for R.
The run-time context available to the interpreter helps when implementing these optimizations. Could they be done with the byte-code compiler too?
Maybe, but here I’ll consider the other option of getting rid of the compiler.
Why compilied code might be faster than interpreted code
Of course, the point of byte compiling is to speed up execution.
Why might interpreting byte compiled code be faster than interpreting the abstract syntax tree of the program?
• Less general overhead for interpreting — fewer conditional branches, less fetching from memory (if code is more compact).
• Some operations done at compile time — eg, constant folding.
• Fast invocation of primitive operations — eg, + and <- — if they are assumed to not be redefined.
• Fast lookup of local variables.
In R Core implementations after R-2.13.0 and before R-3.1.0, the byte code
compiler was also the focus of work on speed improvement. Several speed
improvements applied only to byte compiled code, but could have been put
in the interpreter as well (notably changes to subset assignment).
Speed of R Core & pqR versions with & without compiling
0.20.30.40.50.60.70.80.91.0
Relativ e elapsed time (log scale)
prg−cv−basisfun prg−em.run prg−gp prg−hmc prg−kernel−PCA prg−lm−nn prg−matexp prg−mlp prg−near prg−Qlearn
R Core Releases −−−−−−−−−−−−−−−−−−−−−−−−−−> <−−−−−−−−−−−−−−−−−−−−−−− pqR Releases
R−2.11.1 R−2.12.2 R−2.13.2 R−2.14.2 R−2.15.3 R−3.0.3 R−3.1.3 R−3.2.1 2015−06−24 2014−11−16 2014−06−19 2014−02−23 2013−07−22 2013−06−28 R−2.15.0
Changes in the relative advantage of byte compiling
1.01.52.02.53.03.5
R Core Releases −−−−−−−−−−−−−−−−−−−−−−−−−−> <−−−−−−−−−−−−−−−−−−−−−−−−−− pqR Releases
time r atio , uncompiled/compiled, log scale
prg−cv−basisfun prg−em.run prg−gp prg−hmc prg−kernel−PCA
prg−lm−nn prg−matexp prg−mlp prg−near prg−Qlearn
R−2.13.2 R−2.14.2 R−2.15.3 R−3.0.3 R−3.1.3 R−3.2.1 2015−06−24 2014−11−16 2014−06−19 2014−02−23 2013−07−22 2013−06−28