• No results found

In this section we summarise the results of all experiments. These were conducted on an other- wise idle Intel Xeon W3520 workstation running at 2.67GHz. All times given are in millisec- onds measured by system calls to get the current time. We set the cutoff time to one hour. We measure the performance of our tool using the following criteria:

Analysis time. This is the time required for our tool to perform the analysis of the program. This includes a control flow analysis, type inference and the calculation of assertions and fail edges. Naturally, since the control flow graph is potentially larger if the maximum length of the execution points is increased, we expect the analysis time to be longer. Transformation time. This is the time required to take the current program and, given the

called from within this program. We expect that the transformation time will depend on the size of the control flow graph.

CFG size. This is the number of nodes in the control flow graph. As the execution point depth is increased, the CFG is also expected to become larger. The size of the CFG also depends on the size of the program.

Number of dynamic checks in the specialised functions. This is the number of estimated type checks present in the code, if performed by the standard interpreter. For example, in the statement f (x, y), a check is made to see whether f is a callable function. If f is a stan- dard library function that expects x and y to have particular types, a check is made for every argument. Therefore this statement requires 3 type checks. We estimate the number of dynamic checks by accumulating this for every instruction associated with a node in the CFG.

Number of fail edges. This is the number of failing assertions inserted, i.e., the number of edges in the control flow graph beyond which the program is guaranteed to fail. Failing assertions that raise a controlled exception introduce no runtime overhead as any of these is typically only executed once, if ever.

Number of inserted checks. The number of inserted type checks in the specialised code. Type checking assertions can potentially introduce runtime overheads and therefore the fewer of these need to be inserted, the better.

We present the full results of our benchmarks in Figure6.20. An important result that we note is that failing assertions are only inserted within the original code if the original code contains latent type errors. From the results in Figure6.20, we can see that for most Python modules, the performance of the analyser is adequate. In fact, we are able to analyse a program more than 30,000 nodes in the CFG in under half an hour. Inevitably, this program analysis and transformation step will increase the initialisation time, just as a JIT compiler would increase the initialisation time of a program. Most of the runtime of our tool is spent on the control flow analysis and the type inference stages. We expect that the algorithms used can be re- implemented in a faster manner and using a faster programming language, as Python is around 80× slower than C[4].

Another important result that we note is that when the maximum execution point depth is in- creased, the number of fail edges increases and the number of inserted checks decreases relative to the original number of checks. Ideally, we do not want our type checking mechanism to in- sert any type checks, as these increase the computation required to run the code. On the other hand, assertions that always fail do not impose a computation expense on the program, as these are typically only executed once, if ever. Therefore, our results are positive because they show that if more computation is dedicated in the analysis phase, the modified program has a smaller number of type checks to compute at runtime.

max. exec. analysis transformation CFG dynamic fail inserted point length time (ms) time (ms) size checks edges checks erasefile, 23 lines of code

1 59 0 38 12 0 1

2 89 1 54 18 1 0

3 100 1 62 22 1 0

4 101 1 62 22 1 0

erasefile2, 24 lines of code

1 50 0 43 17 1 0

2 52 0 43 17 1 0

3 53 0 43 17 1 0

4 53 0 43 17 1 0

fixpoint, 24 lines of code

1 82 0 40 11 0 3

2 155 1 67 21 1 2

3 249 2 94 31 2 1

4 360 3 121 41 3 0

introduction, 21 lines of code

1 197 1 67 39 0 3

2 278 2 97 57 1 2

3 414 3 127 75 2 1

4 526 4 157 93 3 0

stackoverflow, 48 lines of code

1 295 2 157 82 3 0

2 931 3 157 82 3 0

3 298 2 157 82 3 0

4 294 3 157 82 3 0

pidigits-python3-2, 40 lines of code

1 257 2 131 62 1 0

2 257 2 131 62 1 0

3 257 2 131 62 1 0

4 258 2 131 62 1 0

mandelbrot-python3-3, 46 lines of code

1 312 2 128 59 2 0

2 313 2 128 59 2 0

3 312 2 128 59 2 0

4 316 2 128 59 2 0

fasta, 96 lines of code

1 358 2 154 87 0 0

2 631 4 208 120 0 0

3 631 4 208 120 0 0

4 634 5 208 120 0 0

meteor, 206 lines of code

1 9764 16 813 409 1 10

2 42822 44 1719 869 1 35

3 247689 210 6357 3215 1 179

4 1510357 1294 30945 15587 1 1043

Our tool is implemented as a prototype, and therefore its performance and scalability should not be used to judge the suitability of preemptive type checking. We believe preemptive type check- ing can successfully be implemented for languages that are similar to Python or new languages designed with this type checking mechanism in mind. Our particular tool can still be used on medium sized scripts or critical parts of larger programs, as long as only a limited subset of Python is used. From the performance figures in the table, we note that our tool is very usable for programs up to 200 lines of code. Looking at the table in Figure6.20, we can easily note that the analysis time seems to be a function of the number of the CFG and the lines of code. If we discount the control flow analysis step, the reason why our analysis does not scale so well is that a global analysis is very expensive. If we could ignore global variables, our tool would be much faster. We could mitigate this problem by performing an escape analysis for every global variable and discounting large parts of the control flow graph where a global variable is not reassigned. We also ran a profiler on our analyser to determine where most of the time is being spent and it appears that around 10 to 20 percent of the time is being spent calculating the hash code of objects such as instructions, execution points, edges and the like. This is because we rely on set and dictionary operations for most of our algorithms. This would be a perfect kind of application where a system that can generate optimised hashing operations [47] would increase the performance.