Summary - Methods for efficient resource utilization in statistical machine learning algorithms

This chapter provided an overview on the fundamentals of this thesis including MBO - a machine learning algorithm with huge resource demands - and the R programming language - the de facto standard software environment for the development of statistical learning applications.

In the following chapters, different methods for efficient resource utilization in statistical machine learning algorithms based on the R programming language will be presented. While Chapter3and Chapter4focus on single-threaded machine learning applications and their resource bottlenecks induced by the R programming language, Chapter5 and Chapter 6will focus on parallel machine learning applications and the optimization of the parallel variant of the described MBO approach.

Profiling of Machine Learning

Algorithms

This chapter presents an analysis of the resource utilization of statistical learning algorithms implemented in the R programming language and is based on the papers by Kotthaus et. al. [KKL+14;KKK+14]. GNU R is the most widely used programming language for statistical data analysis in general, and biostatistics in particular. While apparently not affecting its popularity, its lavish use of resources makes it unsuitable in an environment where high performance is required, or where computation and memory resources are scarce. Here, runtime performance and memory consumption are critical aspects, that can lead to unacceptably long execution times. To solve this problem it is important to find out where the bottlenecks are and thus where the biggest room for improvement lies.

One major hurdle for efficient R programs is that classical ahead-of-time compilation of R code is hindered by R’s highly dynamic nature. This leads some users to re-implementing performance critical parts of their algorithms in C or C++ to achieve a higher execution speed. However, translating bigger parts of an R program to another language is a complex task, since R programs rely on functions from R libraries or basic functions included in the R interpreter execution environment that might not be readily available in other environments.

In fact, one of the main drivers behind R’s popularity is the vast amount of available open source software packages, a fact that has also proven to be a near- insurmountable obstacle for alternative statistical computation languages that have since been proposed [Tie18;RIn18]. In spite of R’s well-known performance issues, these languages have never gained any significant traction within the statistics community. In recent years, multiple approaches have been developed to improve the execution speed of R applications. There are projects with the goal to create alternative, more efficient R implementations [Ber18; TDH12;KMM+14;KE18]. However, most of these are experimental, with only a few users due to compatibility problems with the available R software libraries. Other projects [WWP14; Nea18] attempt to provide a faster R by modifying the original GNU R to stay compatible with the available R libraries without the need of reimplementation. All of these projects have usually shown improvements for simple R programs and micro-benchmarks, but they exhibit fairly mixed results when it comes to speeding up complex real-world applications like machine learning algorithms.

Optimizations for a faster execution and also for an efficient resource utilization of statistical learning algorithms based on R can only be profitable if they cover two aspects: Staying compatible with the available software packages most R programs are based on, and simultaneously covering the real resource bottlenecks of those algorithms. It is therefore indispensable to analyze the runtime and memory consumption characteristics of learning algorithms. With the analysis of bottlenecks arising during execution, the optimization potential for resource utilization can be estimated and new efficient optimizations can be developed. The R execution environment already includes profiling tools such as Rprof for analyzing bottlenecks, but the analysis is restricted to high-level characteristics only. In particular, internal functions like memory management of the R Interpreter itself are outside of the scope of what can be profiled. For a precise analysis, new profiling mechanisms need to be developed to analyze runtime and memory behavior of real-world R programs. The objective of this chapter is to analyze the resource utilization of statistical learning algorithms to support the development of new optimizations that enable these algorithms to scale to larger problem sizes. As a first step towards this goal, the most common classification algorithms are analyzed with respect to their resource requirements, to determine where the highest optimization potential lies. To accomplish this, an R profiling framework, called traceR [tra18], is redesigned and enhanced. Even if the analysis is focusing on learning algorithms, the results also support the development of new optimizations for the R programming language in general.

This chapter is structured as follows: First, Section2.2gives an overview of the R language environment including the R language characteristics and execution model of R. First, the related approaches of optimizations for the R language are presented in Section 3.1. Section3.2 then describes the profiling framework that serves as a basis for the performance analysis. An overview of the machine learning benchmarks and their input data sets used in the analyses is given in Section 3.3, followed by a detailed analysis of their runtime and memory behavior. This analysis serves as a starting point for developing approaches to overcome the identified bottlenecks. Finally, the results are summarized in Section 3.4.

3.1 Optimizations for R - Existing Approaches

A major hurdle for general speedups of R programs is that R is executed by interpretation as opposed to it being compiled to machine code. Classic ahead-of- time compilation of R code is hindered by the fact that R is highly dynamic. Thus information like data types which is needed for optimizations in the compiler is only available at runtime. For example, when a function is declared in an R program, no data types need to be specified for the parameter list. When such a function is called, there are multiple ways to pass the same set of arguments. These features make R very convenient for the user, but very inconvenient for ahead-of-time compilation.

Other languages with a similarly dynamic nature like Matlab or Python have overcome such runtime issues by using Just-In-Time (JIT) based compilation approaches [AP01;BCF+09]. These approaches use knowledge gained at runtime to specifically compile fragments for the time-intensive parts instead of either compiling the entire program at once or interpreting the program statement-by-statement.

One popular runtime environment that provides a just-in-time compilation is theJava Virtual Machine (JVM) [LYB+14]. Kotthaus et al. [KPM12] presented a concept for implementing an optimized version of R by targeting the JVM. Exist- ing alternative R execution environments that also target the JVM are the fastR project [KMM+14;SWH+16] and Renjin [Ber18]. But also other VM implementations were utilized to speed up R, e.g., the NQR project that targets the Parrot VM [KE18]. Furthermore, approaches that propose experimental specialized JIT compilers for R exist [TDH12; TDH14]. These approaches reimplement the original R interpreter that is written in C, in another language such as Java or C++ and benefit from optimizations available to their runtime environments. However, the reimplementations cannot yet guarantee full compatibility with existing R programs and libraries due to the complex and evolutionary development of the R language and its missing formal specification.

Other projects like pqR [Nea18] or Orbit VM [WWP14;WPW15] attempt to provide a faster R interpretation by modifying the original R interpreter instead of reimplementing GNU R to stay compatible. The original GNU R execution environment also contains the option to compile R functions into byte code for faster evaluation which provides some improvement in runtime especially for programs that use loops [Tie01]. Furthermore, additional libraries were developed to speed up arithmetic operations by taking advantage of specific processor architecture features like the Intel MKL [Int18] library or OpenBLAS [ZWW18]. Such libraries are optimized implementations of the reference BLAS (Basic Linear Algebra Subprograms) library that is included in the R execution environment.

All of the described optimization approaches have usually shown improvements for simple R programs or specific R functions, but they exhibit fairly mixed results when it comes to speeding up complex real-world applications like machine learning algorithms. One of the objectives of this thesis is to provide insights into the runtime and memory behavior of these algorithms on the original R execution environment. The hope is that both alternative R implementations as well as the original GNU R can use these results to develop optimizations that improve the runtime performance and resource utilization of real-world code.

Morandat et al. [MHO+12] already analyzed bottlenecks for R programs from different fields of statistics. Here, mostly artificial input data sets where used. However, as the characteristics of the input datasets vastly influence the runtime behavior of a program, only realistic data can yield results which are beneficial in practice. In this thesis, we focus specifically on machine learning algorithms combined with real-world input data sets from the UC Irvine machine learning repository (UCI) [BL18] in order to ensure a realistic scenario when analyzing

the main reasons for the lavish use of resources of R. Therefore, the R profiling framework traceR, presented in the next Section, is redesigned and enhanced.

In document Methods for efficient resource utilization in statistical machine learning algorithms (Page 31-36)