Designing for reasonable performance - Full-stack live coding

3.2 Full-stack live coding

3.2.3 Designing for reasonable performance

As physical computing machinery becomes increasingly complex, and the physical limitations of computation increasingly constraining, the ability for a programmer to wring every last cycle from her hardware becomes more challenging.

One approach is to keep throwing more hardware at the problem. The general idea being that it is cheaper (both socially and economically) to waste CPU cycles using higher level, but relatively inecient programming environments, rather than wasting valuable human cycles working with lower level, although more ecient programming environments. The intimation being that an inecient programming environment oper- ating at a higher level of abstraction should display less cognitive dissonance (i.e. should be better focused on the domain problem) than a lower level, but presumably more ecient, environment.

Unfortunately, where hardware is already size and heat/power constrained throwing hardware at the problem often makes the problem increasingly intractable. It is watts, rather than pure numerical performance per se, that is now driving both the development of the next generation of exascale Supercomputers[38] at one extreme, and IoT[31] (the Internet of Things) at the other. It seems clear that simply throwing more hardware at the problem, when the problem itself has hard physical limitations, like heat/power and size, simply exacerbates the problem.

If low-level system performance and eciency goals cannot be adequately attained by throwing hardware at the problem then perhaps it might be possible instead to rely on a suciently smart compiler. A suciently smart compiler capable of abstracting away low-level details without sacricing low-level exibility or performance. The form of a suciently smart compiler[33] changes over time, but at this moment in history large virtual machines like the JVM and the CLR are the usual manifestation.

Advocates of suciently smart compilers highlight that declarative solutions require less work by the programmer and ultimately perform better, as the compiler is better suited to managing the low-level imperative bookkeeping state management, register allocation, parallelisation etc.. Because declarative programming describes rather than instructs, it is argued that the compiler is better able to perform advanced optimizations. A garbage collector can better manage memory, a proling JIT compiler can better optimize native code and the compiler can parallelise more eectively using vector registers/opcodes.

It is dicult to argue against the suciently smart compiler; after all, no one wants to be distracted by irrelevant imperative bookkeeping! However, from a system's programming perspective, the suciently smart compiler breaks-down in a couple of key areas. Smart language technologies, including garbage collection, and JIT compilation, optimize for, and are themselves optimized for, the general case. If you t the general case, well and good. If you don't t the general case, then you are at best at a disadvantage, and at worst at out of luck. In Scalability! But at what COST?[59], McSherry et al., discuss some of the hidden costs associated with smart distributed parallelisation, with a particular focus on graph processing.

Modern JIT compilers, such as those in virtual machines like Oracle's HotSpot for Java or Google's V8 for JavaScript, rely on dynamic proling as their key mechanism to guide optimizations. While these JIT compilers oer good average performance, their behaviour is a black box and the achieved performance is highly unpredictable.[74, p.41]

For the systems programmer, the inherent unpredictability of smart technologies is often unreasonable, in both senses of the word. It is not the average performance, but the worst case performance, that is usually of primary concern in systems programming

contexts. Additionally, the black-box nature of smart compilers makes reasoning about a source codes run-time performance challenging, often extremely so. A well known aphorism is that all problems in computer science can be solved by another level of indirection to which is added the often cited corollary except of course for the problem of too many indirections6_.

The two most common approaches to circumventing these unreasonable performance issues when adopting high-level languages, for low-level systems programming, is either to adopt an FFI style model (as discussed in the half-stack section) or alternatively to extend the high-level language down to meet low-level requirements.

There are many good reasons for wishing to move systems programming up the abstraction tree. As noted by Frampton et al. The power of high-level languages lies in their abstraction over hardware and software complexity. Leading to greater security, better reliability, and lower development costs[27, p.81]. However, the abstractions re- quired to achieve these goals are often antithetical to the goals of systems programming. Frampton et al. go on to say However, opaque abstractions are often show-stoppers, for systems programmers, forcing them to either break the abstractions, or more often, simply give up and use a dierent language[27, p.81]. Many of the features that dene higher-level languages - boxed data types, garbage collection, read/write barriers, array bounds checking - are exactly the features that are most intrusive for low-level programming. The trouble is that for the systems programmers, what is considered irrelevant bookkeeping is very often at the very heart of the matter!

Some impressive attempts have been made to address these concerns, for example the Jikes RVM (research virtual machine) org.vmmagic package, which incorporates compiler intrinsics and semantic regimes, provides one such low-level system's program-

6_{Although commonly attributed to either David Wheeler or Andrew Koenig, there appears to be no}

ming solution for Java[27]. However, these solutions generally provide only partial an- swers (org.vmmagic supports unboxed primitive types for example, but not unboxed compound types), and at the cost of signicant complexity. Complexity for both the language designer/implementer, but also for the end-user programmer, who is faced with a considerable stack of abstraction layers to be managed (see org.vmmagic semantic regimes). There is also a question mark over the true extensibility of these platforms. The org.vmmagic project strives to provide an extensible framework for low-level systems programming, but relies heavily on an extensive library of built-in compiler intrinsics. These intrinsics must be maintained and extended by the Jikes RVM core compiler team, making more work for both the system's developers, who must dene new intrinsics to support new hardware features, and end-user programmers who must strive to under- stand the semantics of these new intrinsics and their relationship to the underlying hardware that the new intrinsics support.

Ultimately smart compilers are necessarily designed for the general case, where systems programming is often about unique and specic cases. Smart compilers are relatively inexible, where systems programming requires great exibility. Smart compilers are dicult to reason about, from a low-level systems perspective. And nally, smart compilers are complex to build, congure and operate, making them dicult to optimally congure, and to port to new hardware platforms.

In document Extempore: The design, implementation and application of a cyber-physical programming language (Page 63-66)