6.4 Automatic Proofs by Boogie
7.1.3 Function Body
The code generator maps each WyIL code of function body to its type and then translates it into a sequence of C code. The following shows a list of crucial code types for generating and optimising code.
• code == arraygenerator( a = (value, size) ) An array generator statement creates an array variable a of given size and initialises each array element with given value.
For a one-dimensional array, we assume the array stores signed 64-bit integers as default type. And we define a single dimensional array as a pointer with an extra size variable to keep track of its array length using above NEW 1DARRAY macro. We also includes a check after memory allocation to ensure the array pointer points to a valid memory address.
1 // Create an array of provided type and size and fill with given value 2 #define NEW 1DARRAY(a, value, size, type)
3 ({
4 a_size = size;
5 a = (type*)malloc(a_size*sizeof(type)); 6 if(a == NULL){
7 fputs("fail to allocate the memory at _NEW_1DARRAY\n",
stderr);
8 exit(-2);
9 }
10 // Initialize each item value of array ’a’ 11 for(size t i=0;i<a_size;i++){
12 a[i] = value;
13 }
14 })
For a two-dimensional array, we first map it to 1D array and specify its size variable to the total number of array items, i.e. width× height, and then populate the array’s value. Therefore, we access the array item at i row and j column by using a[i ∗ width + j ], instead of a[i][j ].
In doing so, all array elements are allocated on contiguous memory space so that the data locality can be improved. Since each sub-array has the same length, the dynamical-sized array is not supported in our project. • code == assign( a = b ) An assignment statement assigns value b to
variable a. For an integer-typed assignment, we do not need to make a copy as primitive integers are declared in stack and automatically copied before any change occurs.
For an array assignment a = b our naive code without optimisation al- ways makes a copy of right-handed side variable b and assigns the copied one to left-handed side a. In addition, the old array size is propagated to the new array.
1 // Make a copy of array ’b’ 2 #define COPY(a, b, type) 3 ({
4 a_size = b_size;
5 a = (type*) malloc(a_size * sizeof(type)); 6 if (b == NULL) {
7 fputs("fail to malloc at COPY macro\n", stderr); 8 exit(-2);
9 }
10 memcpy(a, b, b_size * sizeof(type)); 11 })
146
Making a copy of right-handed side variable in each assignment slows down program execution. Thus, we use copy elimination and de-allocation analysers (see Section 7.2) to find out and remove extra copies from some assignments and improve the efficiency.
• code == binOp( a = (b, c) ) A binary operator manipulates variable b and c with operator binOp, and stores the result to variable a.
1 a = b + c; // add a = (b, c) 2 a = b * c; // mul a = (b, c)
The common operators include addition (+), subtraction (-), multiplication (*), division (/) and remainder (%), etc.
1 // Detect the addition overflow ’a = b + c’ 2 #define INT_ADD_OVERFLOW(a, b, c) 3 ({
4 if(__builtin_add_overflow(b, c, &a)){
5 fputs("Detected an add overflow \n", stderr); 6 exit(-2);
7 }
8 })
We may encounter arithmetic overflows for unbounded integers, and thus use GCC built-in functions (Stallman and the GCC Developer Commu- nity, 2003) to check whether the operation causes overflow or not, and throw out run-time exceptions if detected. By default, the overflow check is disabled because we declare all integer variables as signed 64-bit inte- gers, and its range (−263+ 1 ∼ 263− 1) is large enough to perform all
normal arithmetic operations on a 64-bit operation system.
• code == label( blklab ) A label statement specifies the block label, which is composed of an identifier and block number (e.g.blklab1), to indicate the location of block within source code.
• code == if( OP(a, b) goto blklab ) An IF statement compares the values of variable a and b with operator OP , and then specifies the block label blklab that is to be executed when the condition is met (true). Common comparing operators OP include eq (==), neq (! =), lt (<), le (<=), gt (>) and ge (>=).
1 // if( ge(x, 10) goto blklab1 ) 2 if(x>=10){goto blklab1;} 3 ...
4 blklab1:; // Block label that ’goto’ branches to
• code == loop([a, ...], [codes]) A loop repeatedly executes a list of codes until any loop condition, comparing the value of a loop variable a, is no longer true. We use a while loop along with one or a series of conditional checks, to decide whether to continue or terminate the loop, as shown in below:
1 // loop ([i, 10, sum], [ sum = sum + i, i = i + 1 ]) 2 while(true){
3 if(i > 10){goto blklab1;}// loop condition 4 sum = sum + i;
5 i = i + 1; 6 }
7 blklab1:; // Loop exit label
A loop may contains one or more inner loops, and our code generator therefore goes into each inner loop recursively, and then put it within the outer loop to form a hierarchy of loop nests.
• code == invoke( a = func(b, c, ...) ) A function call passes one or more parameters b, c, . . . to the called function func, and returns the result to variable a if return value is required.
Our naive code always copies an array parameter first and then pass the copied one to called function, to ensure all changes to parameters made by the function call will not affect the original values at caller site. By doing so, our naive code conforms to immutable value semantics in functional programming language and thus does not cause any side effect.
1 // a = func(b)
2 // Make a copy at ’b’ at function call ’func’
3 // Pass call−by−reference array size ’a size’ to ’func’ 4 a = func(COPY(b), b_size, &a_size);
However, the copying of array parameters increases the overheads when arrays are large and makes the execution slow. Also, the de-allocation
148
of array parameters is another performance issue because it may lead to memory leaks or worse double freeing problem.
Our copy elimination and deallocation analysers can work together to sort out the needs of parameter copies and determine their deallocation responsibility (see Section 7.2)
• code == assert( expr ) An assert statement contains a block of byte- codes to evaluate an condition expr . If the assertion fails, an exception is thrown out to stop the program execution and ensure the safety.
1 // assert ( expr )
2 {// Beginning of assertion block
3 if(expr){goto blklab0;}// If expr is evaluated to true, go to blklab0 4 fprintf(stderr,"fail");// expr is evaluated to false, throw error 5 exit(-1);// Stop the program
6 blklab0:;
7 }// End of assertion block
• code == return( a ) A return statement passes back variable a to the caller when the invoked function finishes. In the case that a is an array return, as its array size is stored separately, the size variable a size can not be passed back to caller site at the same time as return array variable a because C language restricts a single return. To address this issue, we use below workaround to handle an array return.
1 // ’a’ is an array returned by function ’func’
2 // ’a size’ is updated by ’func’ function and the change is visible at
method ’main
3 int64 t* func(int64 t* b, size t b_size, size t* a_size){
4 ...
5 *a_size = 10; // Update the size of array ’a’ 6 return a; // Return array
7 } 8 // Method ’main’ 9 void main(){ 10 int64 t* a; 11 size t a_size = 0; 12 int64 t* b; 13 size t b_size; 14 ...
15 // Pass ’a size’ as call−by−reference parameter 16 a = func(b, b_size, &a_size);
17 assert(a_size == 10); 18 }
The size variable a size is passed as a call-by-reference parameter to called function func, so that its value is updated before the return. After
the function call, we will have both output array and size updated by called function func, and those changes are visible at caller site.
1 // Function ’func’ may change ’b’ array and may return ’b’ array 2 // If not, return new array ’c’
3 function func(int[] b, int num) -> int[]: 4 int[] c = [0;3] // c[0] = 0 5 if num > 10: 6 b[0] = num 7 return b 8 else: 9 return c 10 // Method ’main’
11 method main(System.Console sys): 12 int[] b = [2;3] // b[0] = 2
13 int[] tmp = func(b, 11) // function call 14 b = tmp // b[0] = tmp[0] = 11
15 assert b[0] == 11 16 sys.out.println(b[0])
17 b = func(b, 65536) // function call 18 sys.out.println(b[0])
19 assert b[0] == 65536
Listing 7.1: Example Whiley program
1 // function func(int[] b, int num) −> int[]:
2 int64 t* func(int64 t* b, size t b_size, int64 t num, 3 size t* ret_size){
4 int64 t* _6 = NULL; size t _6_size = 0; 5 int64 t* c = NULL; size t a_size = 0; 6 //arraygen %6 = [0; 3] : int[]
7 NEW 1DARRAY(_6, 0, 3, int64 t); // 6 size = 3; 8 //assign c = %6 : int[]
9 c = COPY(_6, int64 t); // c size = 6 size; 10 //ifle %1, 10 goto blklab0 : int
11 if(num<=10){goto blklab0;} 12 //update b[0] = num 13 b[0] = num; 14 //return b 15 *ret_size = b_size; 16 return b; 17 blklab0:; 18 //return c 19 *ret_size = c_size; 20 return c; 21 }
Listing 7.2: Naive C code of function func (comments: WyIL code)
Example 7.1 We will illustrate the procedure of generating naive code from a WyIL file with an example program as shown in Listing 7.1. Function func takes array b and integer num as inputs, and checks num value to decide whether to return an array b with update, or a new array c.
150
value to array tmp, and then over-write array b with array tmp. In line 17, we make another function call to update array b with larger value.
Function func Has argument array b and its size b size and integer num.
Also, an extra call-by-reference size variable ret size is passed as an argument to function func to keep track the actual size of return array. And we declare all the local variables as follows:
• All integer typed variables are signed 64-bit integers (int64 t);
• All integer array typed variables are signed 64-bit integer pointers (int64 t*); • All array size variables are defined as size type (size t) as it can repre-
sent the size of any object in a program;
• The argument of return array size is declared as size typed pointers (size t*), instead of a value, so that function func has direct access to modify its value and make the updates visible to the caller.
Whiley intermediate code replaces each target of every assignment with a new variable since it follows static single assignment form (Pearce and Groves, 2015a). Thus, we have arraygen code in line 7 to store the newly created array to a temporary variable 6 . Then we have an assignment in line 9 to write temporary array 6 to target variable a. Due to value semantics for each assignment, we therefore make an extra copy in line 9.
The return of function func is based on the value of passed num to deter- mine to pass back array x or c. And before each return statement, we update the passed call-by-reference size argument ret size with specified size variable of return array.
Method main Creates a new array using NEW 1DARRAY macro and makes
two function calls on func and assigns the return to variable x . Similar to Func- tion func, we use the same rule to declare the types of all local variables. And in naive/unoptimised code all the copies are needed to ensure right-handed
side variable will not be changed by an assignment and passed parameters will not affect the values at caller site, and achieve side effect-free function calls as well as assignments.
1 int main(int argc, char** args){
2 int64 t* _5 = NULL; size t _5_size = 0; 3 int64 t* b = NULL; size t b_size = 0; 4 int64 t* _8 = NULL; size t _8_size = 0; 5 int64 t* tmp = NULL; size t tmp_size = 0; 6 int64 t* _18 = NULL; size t _18_size = 0; 7 //arraygen %5 = [2; 3] : int[]
8 NEW 1DARRAY(_5, 2, 3, int64 t); // 5 size = 3; 9 //assign b = %5 : int[]
10 b = COPY(_5, int64 t); // b size = 5 size;
11 //invoke (%8) = (b, 11) func : function(int[],int)−>(int[]) 12 _8 = func(COPY(b, int64 t), b_size, 11, &_8_size); 13 //assign tmp = %8 : int[]
14 tmp = COPY(_8, int64 t); // tmp size = 8 size; 15 //assign b = tmp : int[]
16 b = COPY(tmp, int64 t); // b size = tmp size; 17 //assert b[0] == 11
18 ASSERT(b[0] == 11); 19 //sys.out.println(b[0])
20 printf("%"PRId64"\n", b[0]);
21 //invoke (%18) = (b, 65536) func : function(int[],int)−>(int[]) 22 _18 = func(COPY(b, int64 t), b_size, 65536, &_18_size); 23 //assign b = %18 : int[]
24 b = COPY(_18, int64 t); // b size = 18 size; 25 //assert b[0] == 11 26 ASSERT(b[0] == 65536); 27 //sys.out.println(b[0]) 28 printf("%"PRId64"\n", b[0]); 29 //return 30 exit(0); 31 }
Listing 7.3: Naive C code of method main (comments: WyIL code)
Listing 7.3 shows the naive code of main method. In the first function call (line 13), array variable b explicitly is copied and passed to function func. Primitive typed variables (e.g. num and b size) do not need COPY macro but can be automatically copied to function func because C programming language applies call-by-value approach to those arguments by default. Then the function result is assigned to a new fresh variable 8 , which does not appear before, due to static single-assignment (SSA) form at intermediate level. In line 25, we have another function call and thus make a copy of array b.
In line 10, 16, 18 and 28 we have an assignment that requires the copy of right-handed side variables, Therefore, we have six copies in main method.
152