Chapter 4 Implementation of an Optimisation Generator
4.7 Performance Analysis
4.7.3 Removing Pathologies
It is well known that the size of BDDs used in model checking can be highly influenced by the ordering of the variables within their construction. It was consequently investi- gated as a potential cause of the pathological cases discovered within Rosser’s runtime performance. Several simple ordering algorithms have been investigated with the aim of reducing the negative influence of these pathological performance cases on the runtime performance of optimizations generated by Rosser.
[11] observed that variable ordering significantly alters the size of the BDD. He showed that for a boolean function, one variable ordering may yield a BDD that is exponential in the number of variables, while a different ordering may yield a BDD of polynomial size. Since the time complexity of operations over BDDs are parameterised by their size, the ordering is influential in the efficiency of the Rosser system. [7] proved that finding an optimal variable ordering is an NP-Complete problem, which is why most systems for variable ordering use a heuristic based approach.
Bits within BDD variables can be composed through interleaving them or se- quencing them. In a sequential ordering, all bits of the first variable are placed first, followed by all bits of the second variable, and so on. Interleaving places the first bit of every variable, then the second bit of every variable, and so on. These compositions can be generalised, so that the composition of multiple variable can be substituted as a variable, for example with variables x1,x2,x3,x4 an ordering could be produced that
interleavesx1 andx2 and then sequences this with x3 and x4.
A desirably property of an ordering would be that it can be computed from some property of the TRANS specification being compiled. The problem of variable ordering within the JEDD code generated by Rosser is quite a domain specific problem. Our methodology follows from past applications of BDDs to pointer analysis that makes use
< x : V1 , n : N1 , n2 : N2 , e : ET1 > foo = 1 B ;
foo = foo { n , n2 , e } > < m e t h . E d g e s { src , dest , et }; foo = foo { x } > < m e t h . C o n l i t { c };
< x : V1 , x1 : V2 > c l o n e = ( n = >)( n2 = >)( e = >)( x = > x1 ) foo ; Figure 4.10: JEDD snippet used in ordering examples
of a simple static ordering for a given analysis, as described in [50].
Implementation and Orderings
The ordering of variables within BDDs is modelled within the JEDD relational algebra system as the ordering of physical domains. Rosser generates multiple physical domains for certain logical domains, one for each variable that is typed by the given logical do- main. These are numbered sequentially, for example V1,V2... denotes physical domains associated with the Value logical domain, Figure 4.11 gives a complete listing and thus is useful when reading the examples, see Section 4.4 for explanation of the different logical domains. Within the following algorithms physical domains are either interleaved or sequentially ordered. All example orderings are given with respect to the snippet of JEDD code in Figure 4.10.
The approach of testing out different orderings allowed a resolution to the prob- lem of the performance pathologies within the Rosser system. The different ordering approaches tried are explained in the sections below. Figure 4.12 gives a comparison of performances with different ordering schemes used. Within this table the averages are measured as slowdown over the time it took Soot to perform these optimizations. A higher number is worse. TheAverage without outliers column shows the average slow- down overSootfor optimizing when methods whose slowdown was over 2x the standard deviation are removed. Whilst comparing the overall performance should always include
Domain Physical Domains Value V1,V2 ... Node N1,N2 ... ET ET1,ET2 ... OP OP1, OP2 ... Call C1, C2 ...
Figure 4.11: Relations between names of logical and physical domains
Ordering Average Slowdown Average Slowdown Pathological Cases
without Pathologies. Default 18.01 2.14 2 Interleaved 26.32 5.4 5 Sequentially Ordered 7.50 2.62 4 with Grouping Sequential Ordering 2.31 1.78 1 of interleaved Groups
Figure 4.12: Performance Comparison with different BDD Orderings
outliers within analysis, this column is useful for identifying whether an ordering has performed badly in a few instances or overall, and thus how to refine the given ordering.
Interleaved
The set of physical domains, as ordered by their first use in the declaration of a relation, are all interleaved. This resulted in worse overall performance, and more pathologies. J e d d . v (). s e t O r d e r ( new I n t e r l e a v e ( V1 . v () , N1 . v () , N2 . v () , ET1 . v () , V2 . v ( ) ) ) ;
Sequentially Ordering with Grouping
In this approach all the domains are sequentially ordered, rather than interleaved. Physical domains that below to the same logical domain are adjacent to each other in the ordering. For example all the physical domains belonging to the Value logical domain are adjacent within the sequence. Overall performance significantly improved, and even though there were slightly more pathological cases, they were faster than the default case. J e d d . v (). s e t O r d e r ( new S e q u e n c e ( V1 . v () , V2 . v () , N1 . v () , N2 . v () , ET1 . v ( ) ) ) ;
Figure 4.14: Example of Sequentially Ordering with Grouping
Sequential Ordering of Interleaved Groups
This is a similar case to the previous one, except that physical domains within the groups are interleaved, rather than being totally sequential. This resulted in a reduced number of pathologies, a better average case, and the single remaining pathology being less extreme in terms of performance.
J e d d . v (). s e t O r d e r ( new S e q u e n c e (
new I n t e r l e a v e ( V1 . v () , V2 . v ()) , new I n t e r l e a v e ( N1 . v () , N2 . v ()) , ET1 . v ( ) ) ) ;
Ordering Improvements
Since the benefits of choosing the Sequential Ordering of Interleaved Groups applies to both the average slowdown, and the average slowdown without pathologies it seems to resolve more than just the pathological cases of a few methods in this benchmark. So it is likely to be an improvement to other cases as well. Future performance analysis of other benchmarking suites can also be conducted more easily after this work, since the implementation described in this chapter is able to generate the different BDD Ordering sequences through a commandline option.
Existing work on general ordering heuristics could be applied within Rosser, with the hope of finding a suitable ordering approach that would work generally. Different ordering heuristics could be experimentally evaluated within the existing Rosser system in order to decide which was the most suitable for BDDs given a representative set of optimizers generated from TRANS.
The JEDD also allows more exotic ordering schemes to be used, for example reversing the bits within an ordering scheme, or, performing a bit shift on existing orders. These have not been experimented with so far due to the complexity of understanding their impact on the size of BDDs.
4.8
Conclusions
The Rosser system described here applies compiler optimizations specified formally to Java programs within standard program development environments: the optimizations are mechanically translated into running code, and applied to given object programs within theSootenvironment using a simple model checker for matching side conditions of optimizations to object code.
There is of course a performance price to be paid by not programing the op- timizer directly. This cost is minimised by actually compiling the optimizations into JEDDrather than interpreting them, and the benefits of a declarative approach outweigh the performance cost, as sophisticated optimizations are often applied only when the code is ready for release—which is usually not a good time to find that the optimizer has introduced new bugs. The use of a formal notation has other benefits: it aids the interactive development of new optimizations and the explanation of the optimizations to third parties.
By using theJavalanguage as the basis of the implementation questions, rather than a toy language all the difficulties of practical compiler development have been undertaken. By compiling TRANS Rosser offers a superior level of performance to specification interpretation systems. Not only does the Rosser system provide a Domain Specific Language for program transformation, its implementation layers on top of the work of the existingJEDD system in order to minimise implementation effort.
If program transformations are to be used as the basis of building practical compilers that can be used in professional programming, questions over the performance of such DSLs must be addressed. This chapter tackles these questions head on—Rosser and analysis of Rosser demonstrate that their usage can be practical.
This work also has a great deal of synergy with Chapter 5 where an approach for finding and fixing bugs automatically is developed along with a tool that implements this approach. The benefits of performance lessons from the Rosser system can be applied in such circumstances.