Our project translates WyIL code into efficient C code of lower memory usage and faster execution, compared to the naive C code that our compiler produces without any optimisation. Two additional things are required to undertake during code generation, as follows.
2.4.1
Bounded Integer
Arbitrary-precision integer requires more memory and computing than a fixed- size integer. For example, BigInteger in Java has variable-length size and must run on slow software layer whereas fixed-size integers, such as int16 t (signed 16-bit integer), uses exact size and can directly run on fast hardware layer.
In our project we use static bound analysis to find the ranges of integer variables and substitute arbitrary-precision integers with a variety of fixed-size types whenever possible.
2.4.2
Memory Reduction
Unnecessary array copying causes program inefficiency and memory leaks lead to program in-scalability. In our naive implementation of WyIL to C code, we include value semantics to have an array copy at each assignment or function call. But excessive array copies which are not always needed waste execution time and resources. In addition, the amount of memory leaks from heap- allocated arrays is accumulated to cause thrashing and a failure to scale up the program to a larger problem size.
In our project, we design a macro system to detect unnecessary array copies and minimise the memory usage whilst maintaining the memory safety.
2.4.3
System Architecture
Our WyIL-to-C backend includes code generation and three static analysers (integer bound, copy elimination and deallocation analysers). Our backend operates at Whiley intermediate language (WyIL) generated from high-level Whiley source code to generate and optimise C Code.
22 Whiley Program Whiley Compiler WyIL Code Code Generator Static Bound Analyser Copy Elimination Analyser Deallocation Analyser C Code Execution Whiley-to-C Backend Invoke Optimise Integer Types Invoke Optimise Copies Invoke Add Macro
Figure 2.1: System architecture (dashed boxes: our project)
As shown in Figure 2.1, the code generator converts the WyIL code into efficient C code while interacting with bound analyser to make use of the fixed-size integer types, and with copy elimination and deallocation analyser to minimise the memory usage in the generated C code by reducing the num- ber of array copies and de-allocating on the unused arrays. Our project goal is to implement a large subset of Whiley in C with parallelism where possi- ble/useful.
Related Work
In this chapter we first go through some static (bound) analysis to find a proper tool to estimate integer intervals and choose bounded integer types, and then examine some related work about memory management and design principles to reduce the memory usage. Lastly, we reviewed some important work about static and dynamic analysis to eliminate the unused array copies and improve program efficiency.
3.1
Static Analysis
Static analysis validates the consistency between software specifications and program behaviours using mathematical methodologies. For example, the bound consistency technique is widely used to solve the finite constraint do- main problem (Marriott and Stuckey, 1998).
However, the problems of object-oriented program languages, such as side- effects and non-deterministic results, make it a grand challenge (Hoare, 2003) to create a compiler, with automated mathematical and logical reasoning, that can statically verify the specifications and detect the errors at compile-time.
Some automatic static analysers use different approaches to find software defects at early compilation stage to improve program correctness and produce high-quality software. Extended static checker for Java (ESC/Java) (Flanagan et al., 2002) uses an automatic theorem prover to analyse the program and find
24
common Java run-time errors, (e.g. array out-of-bound or null dereference, etc). Also, ESC checker can be used to analyse concurrent Java programs and issue warnings for potential run-time race conditions and dead locks. As ESC requires to annotate specifications in programs, the annotation burden and excessive warning messages could cause inconvenience for programmers.
Boogie, which was originally developed in Microsoft Spec# (Mike Barnett, 2005) system to verify a C# program, acts as an intermediate verification lan- guage (Leino, 2008) to transform a Boogie program into verification conditions. By using an automatic theorem prover (e.g. Z3 satisfiability modulo theories solver (de Moura and Bjørner, 2008)) it can statically prove the correctness of a program against pre- and post-conditions, and Boogie can point out possible error cause in the program if verification fails. Using Boogie can avoid expen- sive run-time check and improve the efficiency of program execution as Boogie has statically verified those conditions at compile time and thus can remove them from run-time. Furthermore, Boogie verification resembles writing a pro- gram, e.g. we can write frame conditions as modifies and ensures clauses in Boogie to restrict which variables a function can change and to write complex formulas in pre- and post-conditions. Apart from Spec#, Boogie supports a variety of programming languages, including Java byte-code with BML (Mallo, 2007), Dafny (Leino, 2010), Eiffel (Tschannen et al., 2011) and C (Vanegue and Lahiri, 2013). Furthermore, Whiley also supports Boogie as a verification back-end (Utting et al., 2017).
The static analysis using abstract interpretation can approximate the ab- stract semantics of a program without execution and allows the compiler to detect errors and find applicable optimisation. For example, Microsoft Re- search Clousot (Manuel Fahndrich, 2010) can statically check the absence of run-time errors and infer facts to discharge assertions. In our project, the number of WyIL code is much larger than its high-level and human-readable Whiley source code as every complicated operation in Whiley is broken down into a series of three-address forms in WyIL to preserve the semantics. We use
abstract interpretation-based static analysis to analyse such a large amount of WyIL code because it can operate at lower execution time and still produce high precision.