Functions and Structure - Lossy Compression applied to the Worst Case Execution Time Problem

4.7 Evaluation

5.1.3 Functions and Structure

Once the basic usage and representation of variables has been stated, it is necessary to examine how variables are modified. Stating the obvious, computer systems provide a basic set of operations which are used to manipulate variables. These operations are defined as mathematical operators; hence examining the properties of the mathematical operators yields information on how modifications should occur.

Computer programs, regardless of programming language or any other implementation method, must be expressible in terms of a sequence of mathematical operators. If they are not, then the program cannot run on the given hardware, and hence the program is not useful. Hence, if the effects of these operators can be represented sufficiently accurately it is possible to analyse any program. Therefore, this low level form of analysis gives a common base across all programs, and further provides a relatively small set of operations which need analysis. Further, a low level analysis can take into account any optimisations which a compiler may make.

Using low level information differs from current approaches which use high level information [100]. Approaches utilising high level representations state advantages such as being able to easily extract information such as constant loop bounds by identifying common ways to implement such bounds and finding these in the code. This in turn aids in proving the existence of loop bounds on more complicated structures. However, as already stated, it should be possible to assume that loop bounds exist, and hence there is no explicit requirement to prove the existence of loop bounds - only the values of loop bounds that exist. In turn, this makes many of the reasons for using high level information redundant; whilst it may be easier to prove the existence of a loop bound from high level information, the number of times any loop can execute given a range of inputs is a constant regardless of representation.

One side effect of discarding high level information is that this also dis- cards information on the structure of the program. From the point of view of a loop bound analysis, the most problematic piece of structure discarding is the loops themselves. Once a low level approach is used, code is represented as basic blocks, and loops may not be obvious. Fortunately, if one is able to translate between the low and high level representations, a loop bound can be constructed from bounds on the number of times that the basic blocks

inside that loop may execute.

Many of the mathematical operators available are expressible in terms of addition and multiplication, as applied to the real numbers. In both cases, these operations perform a well defined mathematical group with the real numbers. A mathematical group gives a guarantee on the following properties, for a binary operations ·:

1. Closure: For all x, y in the group, x · y is also in the group

2. Associativity: For all x, y, z in the group, x · (y · z) = (x · y) · z

3. Identity Element: There exists an identity element i such that x·i = x

4. Inverses: For each element x, there exists x−1 such that x · x−1 = i

In their usage in computer systems, all of these properties are preserved by the approximation of reals as floating point numbers when using addition and multiplication. However, in the case of the approximation of integers as modular integers these properties are not preserved. This is because in computer systems an integer overflow is treated as an error condition. For addition, this causes the property of closure to be invalid. Fortunately, in practice this should not be a problem, as overflow is normally treated as an error in the program.

In addition, multiplication also loses the group property when applied to modular integers, as Zx only forms a group under multiplication if x is a

prime number, because otherwise not all elements will have inverses. This is almost never the case, because as stated before the modular integers in a computer system are mapped to Z2n, the integers modulo 2n (typically

with n = 32 or 64); hence under multiplication, these integers only form a group under multiplication for n = 1. Therefore, for any modular integer variable that can hold more than two values, the property of Inverses can not be assumed, and hence any analysis must take this into account.

In practice, the loss of inverses results in potential loss of accuracy when applying multiplication to ranges of integer values. This can be seen when multiplying two ranges of numbers together; if x ∈ [1, 2] and y ∈ [3, 4], xy ∈ [3, 8]. If x and y are real numbers, than xy can be any number in the range [3, 8]. This is not true in the case that x and y are integers, as in this

case xy cannot be 5 or 7. There are three views to dealing with this problem: the first is that as multiplication must introduce some additional constraints with regards to divisibility and that these constraints must be represented. The downside to this option is a significant increases in the complexity of the representation and additional complexity when determining what values a variable can hold. The second view is that a program is unlikely to undo work that it has already done. Taking the second view, these constraints do not have to be represented, as the divisibility of a number obtained by multiplication is unlikely to be checked. However, without these additional constraints invalid results may be introduced. For example, if a memory address were computed by multiplying x ∈ [1, 4], a range of values with a single value y = 4, without the constraint of divisibility by the y any value in the range [4, 16] could be considered. However, as a memory address is likely to refer to some form of memory structure, memory addresses not divisible by 4 are likely to be invalid and even nonsensical should they be evaluated. Finally, a third option would be a compromise, which only stored a limited amount of information.

For both addition and multiplication, it is possible to add properties on the value of the output. For example, adding positive numbers results in a number strictly greater than either input. Given that such properties can inferred as mathematical fact and give useful information about the value of the output, this information should be kept. In particular, these additional properties may not be obtainable by other means; for example, adding the ranges [1, 2] and [2, 4] gives a value in the range [3, 6]; without exploiting the properties of addition, it would be impossible to prove that the result is greater than the second operand, as the ranges intersect.

Other functions may not provide as many properties for free. For example, the modulo operator does not preserve the properties of groups. As the modulo function gives the remainder of division, an inverse for the modulo operator can no longer be defined. Hence information known about the inputs may not give as great a return in information on the output as with addition and multiplication. Similar problems can be observed in other surjective functions (e.g. trigonometric sin and cos). However, surjective functions do have the benefit that the output of the function is more re- stricted than the input (e.g. the function “%y” is guaranteed to give an output less than y).

In a similar vein, bitwise logical operators lack the piecewise continuous nature of other functions described so far. Whilst bitwise operators are surjective, as is modulo, bitwise operators give additional information about not only their output but also their input. As bitwise operators are only defined for integers, one can infer that not only is the output of the function an integer, the input of the function is also an integer.

Hence to summarise, functions provide two forms of information. The first form of information relates to the outcome of the function; depend- ing on the function and inputs, properties can be inferred on the output. The second form of information is on the input to the function: given that functions may not make sense for all input values, these restrictions can be reasonably placed upon the input values. The functions discussed in this section have been primarily concerned with the values of variables and finding properties on the outputs; the next section discusses functions used to control program flow, and how the evaluation of these functions may result in assumptions being placed on inputs.

In document Lossy Compression applied to the Worst Case Execution Time Problem (Page 143-146)