9. Conclusions
9.8 Further Work
This dissertation has concentrated on parallel software and parallel computers. There are other types of planning involving target systems of a parallel nature which are susceptible to discrete event modelling and which involve contention, inter-dependence, uncertainty and variable behaviours to which similar techniques can be applied. Examples include the planning of projects, the design of the flow of work though a factory and the design and the tuning of computer systems. There is scope for further work in comparing the methods and techniques developed for other fields with the methods and techniques developed here.
Within the parallel computing field, this research has confined its attention to DM MIMD systems in which processors are connected by links. This is because very large machines can be constructed cost-effectively on this basis. Further research might investigate the application of the tuning methods developed here to other types of parallel architecture. It is possible that the methods which are most appropriate for tuning will be different for different architectures or that the limitations will be different. For example, the performance issues associated with shared memory machines are often better tackled when developing the hardware and system software and this changes the nature of the tuning problem quite radically.
It has been assumed in this research that processors and their connections are utterly reliable. In practice, processors and communications are subject to software and hardware feilures. This is especially true for large machines and long programs. Mechanisms at hardware level, system software level and application software level to make the system fault-tolerant are likely to affect performance, as are the processes of recovery from failures. Performance tools can be applied to the tuning of such mechanisms. However, it is also necessary to take such mechanisms into account in the design and use of performance tools and the interpretation of their results.
The parallel programs to be tuned have been assumed to have exclusive access to the parallel machine on which it executes. However, if multiple sub-programs contend for multiple processors, very complex behaviour can result. This is particularly important for real-time systems, where signals or interrupts can occur at arbitrary times, and for parallel systems shared between a number of users, where a parallel system might service a workload containing an arbitrary mix of batch programs and requests from interactive users.
Another important restriction on this work has been that most of the tuning methods considered, with the exception of automatic load balancing, have been manual, rather than automatic. It would also be possible to use optimisation algorithms, such as genetic algorithms, to control the parameters of model simulations and thus automatically tune the program design. This is likely to be a promising approach where each tuning alternative can be expressed in a simple form, such as a table of numbers, and where, as with very complex programs, diagnostic techniques are less easy to apply. It is likely to be less successful where tuning involves human judgement and insight, such as in the re-engineering of computations. Overall, it is to be expected that automatic and manual approaches will complement one another, so further research is needed to investigate how the complementary approaches can best be exploited together.
A further area where there is scope for further investigation is that of model analysis techniques. Two simple techniques, critical path analysis and critical load analysis, were chosen for investigation on grounds of their low cost and compatibility with performance visualisation tools. However, it is possible that other techniques will lead to insights which are useful for understanding behaviour and for tuning. An interesting trade-off to investigate would be that of cost of use against the usefulness, generality and validity of results.
The issue of tuning through modifying the properties of the system software has not been addressed here. In particular, performance improvements can sometimes be made by altering scheduling policies and the
routes by which messages are transported, but these are not generally under user control and cannot be altered without considerable trouble to the programmer. However, research into this issue would provide useful input into the design of future system software. This research is particularly important for real-time systems, in view of the problem of anomalies discussed in this dissertation.
It has been found that programs with non-deterministic, data-dependent traces have the best potential for risk-free tuning. In particular, DAG programs are somewhat exceptional in the ease with which model tuning can be applied. It would therefore be useful to research the extent to which algorithms in general can be implemented in parallel in the form of DAG programs or can be implemented to have deterministic and data-independent traces. This is particularly pertinent in that a number of tools for graphical prototyping of DSP systems are currently available. Furthermore, the increasing popularity of research into the use of Field Programmable Gate Arrays, which are “programmed” in a languages such as VHDL, which resembles the DAG paradigm, makes this an interesting line of research.
In this research, there has been an emphasis on programs with data-independent, deterministic traces, because it was argued that massively parallel programs were more likely to be designed, built, debugged, maintained and re-used on this basis. This focus has led to some parts of the problem space receiving much less attention. In particular, there was no detailed and thorough investigation of the interactions between large or unbounded variations in grain sizes, data-dependent and non-deterministic traces and problems with imperfect tools. Furthermore, it has been argued that the techniques discussed are not applicable to load balancing programs, especially where the grain sizes are data-dependent. It is much harder to reason about whether such a system can achieve a specified run-time than is the case for a system with a fixed trace, so there is scope for further work to investigate this issue. There is also scope for research into how appropriate specifications of performance might be formulated if such a system is not required to achieve a guaranteed run-time.
Finally, this dissertation has not substantially addressed the issue of how performance tools should be designed, although details of the VPB tools have been discussed. This area is intensively addressed by academic and industrial research, so there is probably little room for ideas concerning new tools unless completely different concepts in performance engineering can be developed. This dissertation has instead focused on the issue of methods by which performance tools can be put to use. However, in the area of tool design, there is scope for further research on how performance monitoring and modelling tools should be integrated with other tools for the design and development of parallel software and how they should be integrated with system software. The HAMLET [HAMLET95] project is an example of such research in which model simulation and monitoring tools are integrated with computer-aided software engineering tools. Other areas of current and future research are the integration of model simulation and monitoring tools with compilers, profihng tools, tools to identify parallelism in serial code, debuggers and code browsers to support software maintenance.
9.9 Summary
A tuning method has been defined on the basis of a combination of techniques: □ performance monitoring;
□ analysis on the basis of optimistic assumptions;
□ simulation of a pessimistic, upper bound model through synthetic workload and discrete event techniques.
The method is most applicable to programs with deterministic and data-independent traces. If grain sizes have large uncertainties or variations, it may not be possible to adequately resolve risks. Residual risks remain from anomalies and the possibility that assumptions made in the optimistic model analysis and the pessimistic model simulation are violated.
This research is of most benefit to developers of parallel software, especially if appropriate tools are available. There are secondary benefits to developers of hardware, system software and software development environments to support parallel systems. The next phase of this research should apply these techniques to large numbers of software development projects. Further lines of research into tuning methods have been discussed:
□ tuning of other types of system modelled by discrete event techniques; □ tuning of other parallel systems;
□ tuning of fault tolerant systems;
Q tuning of systems supporting multiple programs; □ automated tuning;
□ use of more advanced model analysis techniques; □ tuning of system software;
□ development o f the methods developed here to tackle programs with non-deterministic and data-dependent traces, complex behaviour and unbounded grain size variations;