GC algorithms, such as Immix [Blackburn and McKinley, 2008], require many such techniques.
Efficient garbage collectors must also be co-designed with concurrency. If the garbage collector desires to make use of the parallel hardware resources, itself needs to implement parallel and concurrent scanning and collection algorithm, which will interact with the memory model. And the JIT compiler must also help the GC insert GC-safe points (yieldpoints) in the JIT-compiled machine code to perform handshakes between application threads and GC threads. Lin et al. [2015] provide an in-depth analysis of the challenges in the efficient implementation of yieldpoints.
Due to the difficulties in garbage collection implementation, language imple- menters often choose naïve garbage collection strategies in their initial VM design. Some implementations, such as Lua [Ierusalimschy et al., 1996], use naïve non- generational mark-sweep GC which does not perform as well as its generational counterpart; some implementations, such as CPython, use naïve reference counting2 which performs worse than mark-sweep by 30% or more, as measured by Shahriyar et al. [2012]; others simply use conservative collectors, such as the off-the-shelf Boehm- Demers-Weiser garbage collector [Boehm and Weiser, 1988], which cannot support copying collectors. Using naïve GC algorithms degrades overall performance. What is worse, as Jibaja et al. [2011] pointed out in their paper, ‘retrofitting support for high-performance collectors is typically very hard, if not impossible’. This is because fixing the collector alone is not enough. It is more important and more difficult to fix the compiler which is supposed to generate stack maps and write barriers. Al- though some language implementations, such as Mono [Mono], successfully migrated from conservative garbage collection to exact garbage collection, others, such as PHP, are stuck with naïve reference counting because its semantics depend on reference counting. Pyston [Pyston], a JIT-compiling Python implementation, attempted to use mark-sweep GC for better performance in its early versions, switched back to naïve RC in version 0.5, because Pyston intended to maintain compatibility with existing C extension modules written for the official CPython which depends on naïve RC, and there were cases where they ‘wouldn’t be able to support the applications in their current form’ [Kevin Modzelewski, 2016].
2.3
Consequences and Summary
As shown in the previous section, many challenging issues, such as memory model, stack maps, write barriers and yieldpoints, arise when handling more than one of the three major concerns in the same system. Language implementers who are not prepared for such complexity often choose naïve implementation strategies to get their language up and running without initially worrying about performance. As we mentioned in Chapter 1, CPython uses a slow but easy-to-implement interpreter
2Not all reference-counting-based GC algorithms are naïve. Shahriyar et al. [2014] developed a high-
performance Reference Counting Immix (RC Immix) algorithm which outperforms the tracing-based production collector in JikesRVM, namely Generational Immix. But the complexity of RC Immix cannot be overlooked.
14 Background
as its sole execution engine, which avoided the complexity of JIT compiling; it uses a global interpreter lock (GIL) [Behrens, 2008] to prevent parallel execution because the interpreter is not designed for concurrent execution; and it uses a naïve reference counting GC algorithm which further slows down performance. At the time of writing, PyPy [Bolz et al., 2009] still has GIL in its mainline code [PyPyGIL], although there is ongoing work of eliminating the GIL using software transactional memory. The official Ruby [Ruby] implementation also has a ‘global VM lock’ which is similar to CPython’s GIL.3PHP’s confounding copy-on-write semantics also originates from the fact that it uses naïve reference counting. These early decisions get baked into the languages themselves, and hinder their long-term development. Now, many CPython native modules assume the presence of the GIL which makes its removal even more difficult. And so GIL-free Python implementation remains a research topic. The copy- on-write semantics [Tozawa et al., 2009] of PHP persists until today as a documented ‘feature’.
These cross-cutting concerns exist because of the tightly coupled nature of JIT compilation, concurrency and GC. This is why we propose ‘micro virtual machines’ to address these concerns inonecarefully designed system. The details of Mu design will be discussed in Part I. In the next chapter, we will look at existing systems that address the issue of language implementation.
3Seevmcore.hin the Ruby source code.
Chapter3
Related work
In the preceding chapter, we introduced some of the difficulties in language develop- ment. This chapter discusses related work that attempt to address these difficulties, and explains why those solutions are insufficient to achieve the goal of manged language development.
This chapter is structured around two fundamental ways to implement a language. Section 3.1 discusses monolithic language implementations; Section 3.2 discusses existing multi-language platforms for language development; Section 3.3 summarises this chapter.
3.1
Monolithic Language Implementations
One way to efficiently implement a managed language may be to implement the virtual machine from scratch. Many real-world language implementations, including HHVM [Adams et al., 2014], SpiderMonkey [Mozilla], V8 [Google] and LuaJIT [Pall], were developed this way. Compared to naïve implementations, these carefully de- veloped implementations indeed significantly improved the performance of the lan- guage.
However, this approach has several problems. One problem is the lack of code reuse. Since such virtual machines are written for one language, their core compo- nents, such as the JIT compiler, cannot be reused for other language implementations. As a notable example, both SpiderMonkey and V8 are written in C++, and both implement the JavaScript programming language using similar techniques including JIT compilation, feedback-directed optimisation, specialisation, OSR and generational GC. Yet no code is shared between these two projects. Given that programming language implementation is difficult, such an approach can only be afforded by those with sufficient expertise and engineering power, usually large companies or organisations such as Mozilla, Google and Facebook. Another problem is that such implementations still tend to be naïve compared to mature systems. For example, despite the powerful tracing JIT compiler, LuaJIT [Pall] still uses a non-generational mark-sweep collector, and does not have any language-level threading support.1
1LuaJIT does have plans to implement more sophisticated garbage collectors in the future version
LuaJIT 3.0 [LuaJITGC].
16 Related work