F uture Work. - Optimised Redundant Cell Collection for Graph Reduction

One of the drawbacks with the present simulator is use of the simple EASE interpreter. The language is very restrictive and programming large examples in EASE proves to be a difficult task. Preliminary work has been carried out in investigating the possibility of implementing the collector on a practical functional programming system. A promising approach appears to be offered by the use of FLIC [PEY-87b, PAR-91], which is intermediate code generated by use of HASKELL [HUD-90]. A separate FLIC reducer has been developed [PAR-91], the storage management interface of which matches the one given above very closely. In this way, large applications, programmed in HASKELL, would be able to run with the collector.

The analysis and results given above, show how costly is the retrieval of overflow data, from the main heap. A possible avenue of research would be to explore an asymmetric division of the heap, for the purposes of storing overflow data. This system would employ

two free lists. One would deal with cells in (say) the top tenth of the heap, whilst the other incorporated cells from elsewhere. The allocation of cells, from the free lists, would be biased = accordance with the use of the cell. Overflow data would normally be incorporated in the P of the heap. This would enable much smaller search spaces to be employed. Normal cell allocation would be biased towards the lower heap partition. When memory is nearly exhausted, then cells could be drawn for any purpose from either free hst. Overflow count locations would have to have a mark bit distinguishing which free list was used for the overflow allocation.

Asymmetric heap division will obviously improve average performance, whilst slightly improving repose times in extreme operating conditions (when overflow data is held in the larger partition). This would involve the extra expense of a mark bit per cell and the small collector overhead of maintaining two free hsts.

The results from the simulation show how effective a small cache is in reducing the number of searches in the main heap. The present cache controlling algorithm is very simple. Work needs to be carried out in order to appraise more sophisticated cacheing algorithms. Account must be taken of the amount of extra collector processing overhead, necessary to implement such cache controllers.

Some discussion has been given of the important connexion between functional languages and parallel architectures. To date, the research has concentrated on simulation of a uniprocessor machine. Little mention has been made of how the collection scheme would be extended to a multiprocessor environment. In principal little change is envisaged in the proposed architecture. Each collector would be associated with a mutator and heap (in a loosely coupled system). The interaction between the collector and any mutator would not need to be significantly different to that outlined in chapters 6 and 7, since the system of signalling to the mutator that it is safe to proceed with graph reduction, could be used to control the behaviour of remote processors. It would, however, be necessary for collectors to communicate between themselves, to indicate which parts of the graph are being scanned by Tarjan searches etc. This would ensure that nodes are traced properly in their S.C.C. structures, which may spread across more than one heap. Simulation of such a parallel regime would be useful in optimising a final design. This would also facilitate work to minimising this class of interaction.

Once the basic techniques have been developed to extended the system for a parallel architecture, other improvements could be investigated. The advantages of weighted reference counting [THO-81, BEV-85], or external node pointers [HUG-85], would enhance the system in a distributed machine. The possibility of employing recursively structured control mechanisms [WIS-88], between each collector, could also prove to be useful in a multi-processor environment. After initial analysis and appraisal of these techniques, with reference to the work presented in this thesis, the simulator could be extended to incorporate some of these ideas.

A possibly fruitful area of research would be to investigate compile-time garbage collection techniques, which could be used to reduce the loading on the run-time reference counting collection. Hughes [HUG-91] suggests that both compile-time garbage collection and a technique of destructive allocation, could reduce the load on the run-time garbage collector. This would be achieved by removing the need to maintain reference counts in certain parts of the machine's memory. Hughes [HUG-91] points out that no compile-time garbage collector is likely to replace a run-time collector, but employing compile time optimisation may be worthwhile.

Section 10.3 Future Work

At the moment, the main scheduler module only stacks one mutator interrupt. It is conceivable that stacking requests deeper than this may improve average collector latency. The effects on the safety of the collector, however, are unclear. This technique would have to be investigated further, to quantify any benefits and ensure that the safety of the collector is not compromised.

The current simulation work has mainly concentrated on the implementation of the collector, in order to demonstrate its functionality and viability. Only preliminary work has been carried out to optimise its performance. From the analysis of the behaviour of the various functions within the collector, further work in minimising the length of the critical (and average) path behaviour of these procedures, may prove to be a valuable exercise in improving the collector’s overall performance.

The final stage in the implementation of such an extended collector architecture, would be to design and build a hardware implementation and then run this with a practical functional machine. This would represent the final culmination to the research, and provide a possible commercial product for use with functionally programmed machines.

(

)

( p r o g r a m ( f a c t 8 ) )

( ( f a c t n ) ( i f ( = n O ) 1 ( * n ( f a c t { - n 1 ) ) ) ) )

Figure A.1 Factorial (8).

(

In document Optimised Redundant Cell Collection for Graph Reduction (Page 180-183)