7.4. The Parallel System Simulation Design
7.4.3. Functional Design of the Simulation
7.4.3.3. Timing Data
In order to predict the performance of the parallel machine model it was necessary to include timings for two aspects of the functioning:
a) the length of time each process takes to execute,
b) the length of time a data packet takes to move from one processing element to the receiving elements.
If these two sets of timings were known all the other behavioural aspects of the system, such as traffic on the busses, work load on each processing element etc, could be calculated. In addition it has been stated as one of the simulation requirements that the process execution times should be capable of differentiation into "true" processing, ie rewriting, and data packet construction and decomposition in order to check that this form of process spawning was not creating unacceptable overheads.
Measurements of actual process execution times proved impossible with the initial resources available. The original computer used for the development of the system was the Sun 3/60 workstation, and attempts to use the system clock in order to measure process times ended in failure as the granularity of the clock was too coarse. It would only measure in intervals of approximately 16 ms and typical process times were considerably smaller than this.
In order to obtain as accurate estimates of process times as possible Assembler listings of the C interpreter code were obtained. The task of enumerating and summing the clock cycles taken by each function was performed and these values were inserted into the interpreter code. Thus whenever an interpreter function was called the timing count for that
The first version of the full simulation relied on these calculated process timings. However two sources of inaccuracy were identified: first the sheer laborious nature of the Assembler inspection task must have led to human errors being made, and secondly full data for the MC68020 processor (as used by the Sun 3/60) was not available and thus timings were based on the instruction set for the MC68000.
At this stage in the project it was decided to experiment with porting the system to a Transputer based system, namely an IBM AT clone with a Transputer card holding a T414b-15 and 2 Mbytes of memory [INMOS 89]. The 3L C compiler for the Transputer had recently become available and this was used to recompile the parallel system software [3L Parallel C 88]. The main advantage of this was the prospect of using the Transputer system clock which operates at a granularity of 1 microsec. The additional clock reading instructions were inserted in place of the calculated measurements and all results presented in Chapter 8 refer to this version of the software. The estimated timings based on the Assembler inspection for the Sun PLL system have not been used but the work involved in the task provided additional information into the detailed operation of the rewrite interpreter, eg the computational overheads because of function calling and recursion, which are considered in detail in Chapter 9.
The second type of timing data that was needed by the simulation was the length of time a data packet took to transfer on a bus from one processing element to another. This value had to take into account the setting up time for a bus as well as data transfer. Any delay due to contention for a bus had to be quantified and added. Obviously there were no actual measurements that could be made by the system clock to obtain these figures unlike the data on execution times. The timing data for packet transfer therefore had to be calculated.
It has been shown in Chapter 5 that the time of packet transfer depended on its size and the number of receiving processing elements. However as has been discussed the software did not actually construct the data packet using instead the movement of process structures to model the information flows in the system. At this stage rather than alter the model to work on the basis of transfer of "real" data packets, additional functions were added to calculate the size of a data packet using information from the group of processes that it would represent in a true realisation of the design.
In Chapter 6 the details of the data items in the data packet are discussed. The tagging of each item with a three bit tag has been assumed, giving data item sizes ranging from eight bits for a user variable to nineteen bits for an integer or floating point number. The introduced variables and pointers were designed with sixteen bit representations. In order to simplify the calculations it was decided to use the figure of sixteen bits to express the size of any data item in the packet. It was important that the size of the data packet should not be underestimated but it was felt that this approximation was unlikely to do this. Thus the size of the data packet was calculated and as the number of receiving processing elements was known it was possible to include the time of transfer of an individual packet. Timings used for the passage of the data packet were based on the estimates given in Chapter 6.5.5, ie the time to complete a broadcast was calculated at (500*n + 250*m) nanosecs, where n is the number of recipient processing elements and m is the word size of the data packet. Delays due to bus contention were known from the stored information within the controller on bus utilisation.