• No results found

4.7 Evaluation

5.1.4 Control and Compression

In addition to pure mathematical operators, many functions will be used to express the control flow of the program; examples of these are the less than function, or simply casting a value to a boolean. This class of functions return a boolean result, and this result is used to determine which path to take through the program.

In the event that it is possible to prove that the output of a comparison is fixed, regardless of input, then the analysis of these functions is trivial. Otherwise, it is necessary to consider each possibility. In this case the main problem to overcome is ensuring that the inferred properties of the relative values of variables remain consistent for the remainder of that paths anal- ysis. This results in multiple paths to consider, each with differing inferred properties of variables, and hence the main problem when performing code analysis: state explosion.

To counter state explosion, it is necessary to implement a method of com- pression which can merge multiple paths back together. As stated before, variables which have been marked as irrelevant have no impact on the con- trol flow of a program, because they are not used again. Similarly, properties inferred by an analysis may not be used again, in which case these proper-

Variables a, b Variables none Variables a, b, c Variablesa, b, d Variables a, b Variables none

Figure 5.1: A simple CFG, illustrating that all variables, no matter their age, will eventually be discarded

ties can be removed as well. Hence, the value to the analysis of properties inferred on variables which have been marked irrelevant is zero, and these can be discarded without penalty. Similarly, variables which have fallen out of scope again have no impact on the control flow of a program and hence data on any the properties of these variables can also be discarded.

Following on from this, one observation of the effect of irrelevant variables is that as all variables will eventually fall out of scope, and therefore become irrelevant. Hence continuing to execute a particular path of the program will eventually lead to fewer relevant variables, and therefore less information will be discarded when lossy compression is employed. Even if new variables are introduced on the path, these new variables will eventually be discarded as illustrated in Figure 5.1. Unfortunately, continuing to execute a particular path of the program without performing merges will lead to further state explosion. A balance can be achieved by evaluating the path of the program which has the most information attached to it until this is no longer the case, and then checking for relevant merges. After any applicable merges have been applied, evaluation can resume on the next path, as determined by the amount of information known about that path. The amount of information known about a path under evaluation is trivially known by counting the number of properties that have been inferred on variables on that path.

A similar effect also to variables being marked as irrelevant also occurs on assignment: inferred properties can be lost. This will most typically occur on reassigning a variable, although not all properties need be lost when this happens. For example, when subtracting a known positive number y from

Property 1 Property 2 Merged Property

None Any None

= 6= None = <, ≤ ≤ = >, ≥ ≥ < > 6= < ≥ None > ≤ None ≤ >, ≥ None ≥ <, ≤ None

Table 5.1: A commutative operator table for Merging properties

a variable x, properties stating a third variable z is equal or the less than x are no longer valid for x − y and would have to be rechecked from the values that x − y can take. However, variables which are greater than x will remain greater, and hence the property of being greater than x − y remains valid. This can be achieved by using the transitive nature of the properties in question, as x − y < x < z, and hence x − y < z.

A final observation is that even with compression, the same code may be evaluated multiple times with different properties of variables. In this case it is possible to have fore-knowledge of the properties that may be tested by the code, as these will have been revealed by the previous evaluation. Hence any property that is not tested may be discarded early.

As already stated, the representation of values of variables picked has a simple merging operation. Hence the main task is in merging states is to be able to merge properties correctly. Fortunately, this is relatively trivial: properties inferred by operators governing the path through the control flow diagram will typically be using the operators =, 6=, <, ≤, >, ≥. As can be seen in Table 5.1, it is possible to merge these properties without losing information.

Given that it is possible to merge properties without losing information, it remains to see what information is lost when merging sets of potential values of variables and properties. The answer is combinations: when merg- ing the potential values of multiple variables, combinations of variables that could not normally occur can be considered in the analysis. To an extent this may be possible to counteract given that the properties describing the

Information Type Value Merge Strategy Variables

Possible Values High - Unsound if pos- sible values not consid- ered

Merge by encompassing both ranges

Excluded Values Medium - Prunes search space

Merge by extending list, discarding if list be- comes too long

Usage Low - May be used to infer additional proper- ties

Merge by assuming worst case

Program Control

Program Counter Critical - Required to be accurate

Only merge when iden- tical

Properties Medium - Prunes search space

Merge by using lossless merge operator

Discarded Variables Medium - Prunes search space

Delete variables once no longer used

Combinations Medium - Prunes search space

Merge by merging vari- ables and properties in program states with the same program counter

Table 5.2: The types of information involved in Loop Bound Analysis

relationships between variables remain, but in general this should lead to a more pessimistic analysis.

5.1.5 Summary

The previous sections have discussed the value and overall merge strategies for the different types of information involved in loop bound analysis. These are summarised in the Table 5.2. Having given an overall outline, the next section will give examples of how this approach may be applied to actual programs.