Design Principles - Micro Virtual Machines: A Solid Foundation for Managed Language Implementat

To achieve our design goals, we have a number of principles which underpin the design of Mu.

1. Mu aims to be asminimalas practical; any feature or optimisation that can be deflected to the higher layers will be, provided that the three major concerns can be efficiently addressed.

2. Mu’s client (the higher-level program that uses Mu. See Section 4.3) istrusted; improper use of Mu may result in undefined behaviours which may have arbitrary consequence, including crashing the entire system.

3. We use the LLVM intermediate representation (LLVM IR) [Lattner and Adve, 2004] as acommon frame of reference for our own IR, deviating only where we find compelling cause to do so.

4. We separate specification and implementation; Mu is an open specification against which clients can program and which different instantiations may implement.

Minimalism is the number one principle. Because we aim to support a wide range of dissimilar languages, the level of the micro virtual machine must be kept as low as possible in order to avoid introducing intruding design decisions which are harmful to the high-level language implementation. The minimalist design also means the VM is easier to implement correctly, which will facilitate the creation of a formally verified implementation.

However, for the convenience of language developers, minimalism will be com- pensated for byclient librariesthat sit above Mu, implementing higher level features, conveniences, transformations, and optimisations common to more than one language. There can be different client libraries developed for different kinds of languages, such as object-oriented languages, functional languages, and so on. Mature projects, such as LLVM, already provide many optimisations ready for its clients to use. As of the time of writing, Mu does not have such libraries, yet. However, when such libraries are developed, they will be strictly client libraries, and excluded from the micro virtual machine itself.

§4.2 Design Principles 25

The client is trusted. We fully trust the client, because the client has more knowledge than Mu about the concrete language. It understands all the requirements and constraints of the language semantics, such as array bounds checking, to enforce them correctly. Therefore, we trust the client to make the right decisions, and do not impose extraneous protection layers which may lead to unnecessary overhead.

Moreover, according to Castanos et al. [2012], the optimisations that have the great- est impact on the performance of programs are usually language-specific. It is true that there exist many language-neutral optimisations that work for many languages, such as common sub-expression elimination, loop-invariant code motion, dead code elimination, and many others provided by LLVM. Those optimisations may have the effect of doubling or tripling the throughput of programs. However, for higher-level languages, especially dynamic languages, the most important optimisations, such as specialisation, can achieve 10x or, in certain cases, over 100x performance gain over naive implementations [Castanos et al., 2012]. Such optimisations depend on intimate knowledge about the semantics of the concrete language. Therefore, we give the client the power and the responsibility for high-level optimisations. Mu trusts the client to make the right decisions, and assumes, for example, that the transformations made by the client’s optimiser are valid. When Mu API functions are invoked, Mu loyally carries out operations according to the specification with no obligatory vali- dations, especially during function redefinition and on-stack replacement which we will introduce in Chapter 6.

We use LLVM IR as a frame of reference for our IR. After LLVM’s inception, it has developed into a production compilation framework known for high performance. It is not good practice to develop everything from scratch and abandon such a mature system. Therefore, we follow LLVM IR as closely as possible to design a performant virtual machine, deviating only when we have a compelling reason to do so. For example, Mu is designed for garbage-collected languages while LLVM is not, so the Mu type system has traced reference types which are absent in the LLVM type system.

It is worth noting that Mu is neither an extension to LLVM nor built upon LLVM. Mu is an independent project, and its design principles are very different from that of LLVM. LLVM IR was only used as a frame of reference for the design of Mu IR. Mu is defined as a specification. This is the most important contribution of Mu. We define Mu as a specification for an abstract machine, which clearly defines the behaviour of Mu. Mu is not merely a collage of features. Although Mu incorporates many different ideas from related projects, including the static single information (SSI) form, garbage collection, the Swapstack primitive, and the C++11 memory model which will be introduced later, the semantics of the Mu IR and the Mu API are carefully defined in one place. Therefore, unlike VMKit, the behaviour of Mu IR programs related to concurrency, execution, garbage collection, and their cross sections, can be reasoned about. This gives the client a dependable platform, and facilitates formal verification.

26 Mu’s High-level Design

The specification allows many different compliant Mu implementations to co- exist, such as a reference implementation, a high-performance implementation, and a formally verified implementation. The JVM is defined similarly [Lindholm et al., 2014], and that specification has allowed many implementations.

Mu is designed by a committee, but a rather small one. Our team at the Computer Systems research group at ANU holds weekly meetings to discuss the design, in order to ensure that the Mu design adheres to our design principles. We take input from the people working on different aspects of the project, including client development, high- performance Mu implementation, and formal verification. Sometimes, a decision may undergo hot debates, and compromises must be made, because a subtle change in one part of the system may have a chain of implications in other parts. Therefore, in this thesis, we will not only describe the details of the Mu design, but also the reasons behind the seemingly arbitrary design decisions.

In document Micro Virtual Machines: A Solid Foundation for Managed Language Implementation (Page 42-44)