6.2 Example GRM Framework M odel
6.2.4 Conclusions I l l
Although the detailed outputs from the overall simulation have not been exercised exhaustively, the integration exercise of inter-connecting tested blocks has fulfilled the central purpose of configuring this particular example model, by enabling detailed examination of the operation of the GRM Framework principles and the GRM Bus. In summary, this example model has demonstrated that:
• A representative radar simulation can be built from custom-written blocks following the GRM Framework principles.
• These blocks can be successfully connected and operated using the novel GRM Bus.
The resulting time-stepped simulation provides a detailed model o f the radar waveforms throughout the model.
7
Si m u l a t i o n Ac c e l e r a t i o n
Having proven the feasibility of implementing radar models using the GRM Framework, the next aim was to make the simulations execute as quickly as possible. This chapter summarises the work undertaken to investigate the software and hardware options available for reducing the run time of simulations using the GRM Framework implemented in SPW.
7.1
Introduction
A radar simulation modelled at complex baseband may require several million samples to accurately represent just one second of radar signals in the time domain. The sample rate may need to be further increased to provide detailed modelling of transients, ‘ringing’, non-linear effects, or other factors. Given the current processing power of the (Sun and Hewlett-Packard) workstations commonly available to radar modellers, even simple simulations at such sample rates will inevitably execute significantly slower than real time. Any attempt to model the complex signal processing features employed by a modem military radar, such as a MFR, would further extend the execution time of the simulation. For modelling purposes, the radar designer should not be required to spend time constructing a dedicated processing platform which exceeds the complexity of the radar itself in order to get simulation results within a reasonable time period.
For this work, the GRM Framework was implemented as C source code within Alta Group’s SPW design, simulation and analysis software package. The time required to execute the design stage within SPW depends on the speed with which the user can construct the simulation; this is greatly increased by the modular Framework approach. The analysis stage depends upon the functions available to the user; SPW has a dedicated Signal Design Editor (SDE) with many useful signal manipulation and analysis features and within the Framework output blocks can be altered to generate whatever output data format the user desires to enhance the presentation of the results.
The speed of the simulation stage, however, is limited only by the processing power available and the structure and efficiency of the software. It was considered that enhancing the efficacy of the GRM Framework would be a valuable exercise, thus a period of work was focused upon software and hardware methods of reducing the execution time for long simulations.
Simulation Acceleration 113 o f 274
The following factors were considered most important in the evaluation of possible accelerator options:
• Compatibility with the current SPW host platform (Sun workstation). • Compatibility with the current user interface (SPW).
• Availability of the solution (time-scales).
• Support provided for the solution (documentation, product support efficiency, training, installation, etc.).
• Processing speed (the benchmarking process should also consider the effects of any I/O to files and/or hardware).
• Debugging capabilities.
• Flexibility (between platforms, systems, compilers, software languages, etc.). • Ease of use (direct/remote operation, user interface).
• Cost (initial capital investment, expansion options, maintenance).
7.2
Evaluation Methods
The representative radar model described in the previous chapter provides a proven system with a known performance without any form of acceleration technology. This radar model utilises detailed time domain modelling at complex baseband - the sampling frequency being at the Nyquist rate or greater. This requires a significant amount of signal processing time relative to ‘real tim e’. It is therefore a suitable basis for evaluating the potential acceleration methods and is hereafter referred to as the reference model.
Both software- and hardware-oriented acceleration options were considered. The available software options focus primarily on modifying the structure of the modules within the radar model to utilise the features of the particular hardware platform being used. For instance the architectures of digital signal processing (DSP) devices are structured towards vector operations rather than scalar operations, so the GRM blocks could be coded to take advantage of this.
The primary hardware options were in the choice of platform and in the quantity of processors used to run a simulation. Platform choices were restricted to those machines compatible with SPW and readily available at Nortel Networks: either standard workstations from Sun and Hewlett-Packard (HP), or DSP systems from a variety of manufacturers. A prime concern with any platform was ensuring adequate support (from either Alta Group or the DSP manufacturer) for running SPW on the platform. SPW also offered a method of running elements of an SPW simulation concurrently on independent processing units, using the SPW MultiProx function.
As the GRM Framework only defines the method for linking the radar modules, rather than the content of the modules, the speed of any particular block is largely dependent upon the efficiency of the code written by the developer. Since this cannot be defined by the GRM Framework, it is only briefly considered here.
7.3
Software Considerations
A GRM simulation implemented within SPW consists of three components which are independently compiled and linked to produce a single executable. The first component is the GRM library, containing:
• GRM-specific functions, e.g. file I/O and GRM Bus interface routines; these are all compiled into a single library.
• Complex functions, such as Constant False Alarm Rate (CFAR), clutter and antenna beam patterns; these are compiled into separate libraries for each function.
The second component is the CGS library - the low level library functions and routines required by SPW to support its Code Generation System (CGS). The source code for the CGS library is included as part of the SPW product. The third component is the simulation structure, built within SPW using the Block Diagram Editor (BDE). SPW translates the block diagram of the radar model into C source code by amalgamating the code underlying each unit block within the simulation using its CGS system. It should be noted that the initial SPW models of chapter 6 were developed using the standard BDE block design approach espoused in the SPW manuals. However, the higher execution speed - typically six times faster - of blocks designed for CGS led to the adoption of this design methodology; the example radar model of section 6.2 was re built using CGS and re-verified and the subsequent models described in this thesis used CGS exclusively.
Information is entered into a simulation in two ways. The first is via a parameter screen within the SPW BDE. In general, every simulation block or nested group of blocks can have its own parameter set. However, this data cannot be adjusted dynamically during a simulation; it is therefore fixed when the program is compiled. Because of this limitation, such parameters are used as infrequently as possible in the design of GRM modules. The second source of input is via data files. Although the file names are generally required to be fixed at compile-time, the content of a file can be altered to allow dynamic (run-time) control of the input to a simulation.
Information can be output from the simulation in many ways, for example to the host platform ’s “standard output” (as defined within C); to a data file (either locally or on another platform); to another process (locally or on another platform); or to an alternative processor (e.g. a dedicated DSP). When considering the various hardware platforms, consideration must be given to the frequency with which I/O is used (given the associated system call overheads), the location of data files (which might impose additional overheads if accessing files across a network) and to ensuring that the simulation has appropriate access privileges for the files.
7.4
Acceleration Options
The two main hardware options available for accelerating the SPW simulation code are the choice of processor (general purpose workstation or a specialised DSP processor) and the number of processors used concurrently. SPW is capable of generating C source code for both situations.
Simulation Acceleration 115 o f 274 7.4.1 Single Workstation
This is the standard SPW simulation route: the simulation code is compiled and linked with the CGS library file containing the precompiled low level SPW CGS functions and the GRM library file containing the precompiled GRM specific functions.
Options for optimisation are in the choice of the compiler and in the choice of the options presented to the compiler. The Sun workstations have available, for example, a basic C compiler bundled with the operating system, a separate Sparc Works compiler or the public domain Gnu C compiler, each of which can provide varying degrees of optimisation (i.e. optimised for speed or optimised for compactness of code). Reducing the debugging and error-checking to a minimum gives a substantial speed increase over non-optimal compiler options but the reference model was already compiled using maximum optimisation thus there was no further mileage to be gained through this approach.
7.4.2 Single DSP processor
The options for using a single DSP processor are slightly more restrictive than for the workstation solution. The DSP is typically located on a vendor’s board, remote from the development workstation, and reliant upon the vendor’s proprietary compiler to best utilise the hardware architecture of both the DSP and the vendor’s board. Such compilers include standard C library functions, however, the source code for the CGS library functions, the GRM library functions and the simulation code will require compiling with this proprietary compiler. In order to maximise the gain from using the DSP, the CGS library source code must be modified to make best use of the architecture of the DSP. This adaptation could be performed for a DSP given a reasonable knowledge of its capabilities (from the DSP documentation) but the simplest solution would be to select a product whose manufacturer provides that support.
7.4.3 Concurrent Processing
Alta has a product called MultiProx which allows an SPW simulation to be partitioned to run concurrently on multiple processors. MultiProx produces a set of programs, one for each processor or platform. The MultiProx library (built into each program) handles communication and synchronisation between the processors.
As for the normal CGS libraries, ideally the MultiProx library should be optimised for the platform or particular DSP according to the inter-processor communication method used (for example some DSPs have dedicated serial links or shared memory for this purpose). Given that this communication may be a critical overhead for a concurrent solution, optimisation of MultiProx is important.
The serial nature of the inter-processor communications highlighted an incompatibility between SPW MultiProx and the initial low level implementation of the GRM Bus. The GRM Libraries were coded in such a way that the GRM Bus used only a single inter-block connection for bi-directional data transfer by sharing what was, as far as SPW was concerned, a unidirectional bus. Duplex transmission of data on the bus relied upon knowledge of the underlying memory allocation techniques used by SPW - effectively part of the unidirectional (forward) buffer was used for reverse transmission while hiding the fact from SPW. As a result
the GRM Libraries had to be re-written such that the GRM Bus was split into two unidirectional buses: a GRM ‘Forward’ Bus and a GRM ‘Backward’ Bus.
Partitioning of the top-level simulation blocks between processors is decided by the user. However, SPW does not provide an easy method of optimising this partitioning. Manual partitioning can be performed using the profiling option of the C compiler to determine the execution time for each module.
7.4A Software support
In theory, SPW simulations can be hosted on a variety of platforms (supported either by Alta or the DSP vendor). As the source code is provided for all the SPW CGS and MultiProx functions, these can be re-written to support additional processor types and vendor DSP boards. However, the complexity of such a task was such that it was not deemed possible within the time-scales of this thesis.
Alta support SPW and MultiProx on Suns, HPs and a variety of PC-based DSP boards as shown in Table 7-1. Alta also recommended the Blazer system from Atlantic Aerospace as a dedicated simulation accelerator.
Processor/D5P Board Manufacturer Compiler TMS320C30 Texas Instruments
Banshee Board Atlanta Signal
Processor, Inc.
Floating Point DSP Optimising C compiler, V . 4.5 or later, by Texas Instruments, Inc.
Tiger-30 DSP Research
TMS320C30 Board LSI/Spectrum
SPIRIT 30 Sonitech
XDS 1000 Texas Instruments
TMS320C40 Texas Instruments
D PC /C 40B LSI/Spectrum Floating Point DSP Optimising C compiler,
V . 4.5 or later, by Texas Instruments, Inc.
DSP96002 Motorola
MM96 Board Ariel Corporation Intertools (C96KS/A96KS) v. 1.1 by
Intermetrics, Inc. or
Intertools (C96KS/A96KS) v. 1.2 with v. I .l linker
DSP96002 System Board
LSFSpectrum
DSP32C AT&T
DSP-32C Board Ariel Corporation DSP Optimising C Compiler v 1.3.3 by AT&T
DSP-32C Board LSI / Spectrum
W M D SP32C
Development System
AT&T
ZPB34 Board Burr-Brown
Table 7-1: AT Bus DSP boards supported by SPW s CGS and MultiProx
Simulation Acceleration 117 o f 274
The processors listed in Table 7-1 have all been on the market for some time. When Alta (UK) were consulted about current support and their intentions for future DSP systems, they indicated that support for the C30 and 96000 processors might well be dropped, and that the ‘Share’ from Analogue Devices might be supported in the future. Atlantic Aerospace hinted at a device code-named ‘Share K iller’ coming from Texas Instruments. Loughborough Sound Images (LSI - who produce many of the PC-based DSP boards supported by Alta for SPW CGS and MultiProx) are expecting a C80 device from Texas Instruments. LSI claim that the C80 comprises a RISC processor plus 1 floating point DSP and 4 fixed point DSPs. Unfortunately software for these new DSPs would not be available until after completion o f the GRM development. Alta will also undertake the generation of libraries for other processors or boards but they also charge a substantial amount for the privilege. Alta provide local support in the UK in addition to an e-mail hot-line to the parent company in the US.
W hilst acknowledging that the newer DSPs will offer higher speed, there is the consideration that the older DSPs and their compilers are a known quantity and there are suitable tools available to make use of them.
7.4.5 Coding algorithms
DSP chips generally utilise a hardware architecture that maximises the processing speed when performing vector operations (e.g. pipelining of data to multipliers). This is highlighted by the application notes accompanying the Blazer accelerator, which compare the relative speeds of a Sparc LXE and a C40 DSP for a ‘Synchronisation and Tracking Loop’ example using each of two different algorithms (vector or scalar) to implement a common function. The results are shown in Table 7-2, showing clearly that the use of vector algorithms can (in this example) substantially improve the processing speed.
Algorithm
Platform
Sparc LXE C40
Scalar 147 s 303 s
Vector 39 s 17 s
Table 7-2: Comparison of vector and scalar processing times
To gain the speed advantages claimed by DSP vendors for vector operation, the developer must write radar modelling code which makes use of the appropriate vector algorithms. The GRM Framework does not define the specific manner in which the user should implement radar algorithms within a block; thus, optimising the code for a vector processor must be done on a block-by-block basis by the user - the precise efficiency increase for any given block cannot be predicted.
Multiple-DSP hardware is primarily intended for concurrent application of the same operation to multiple streams of data. Careful consideration of the partitioning of GRM radar models indicated that although multiple simulation blocks might perform similar operations in parallel, this was at too high a level to take advantage of the DSP vector processing. In addition.
transferring a simulation from one processor to N processors would not give a factor of N
improvement in speed as the processors would rarely be equally loaded.
7.5
Single-Processor Solutions
A number of hardware platforms were evaluated by using them to host simulations based upon the GRM reference radar model. M easurement of the platform performance was derived so as to minimise the influence of extraneous factors. A primary concern was file I/O, where link performance between the host system and the simulation platform would affect the speed with which the simulation could access files on the host. Representative simulations typically use a minumum level of file I/O thus this factor could not be removed completely. Simulations were run twice on each platform: the first with the minimal possible number of output files (two CFAR files dumped only at the end of a simulation); the second with a large number of output files (an additional ten files dumping I/Q data continuously throughout the simulation). The conclusions are presented in the following subsections.
7.5.7 Sun Workstations
The reference radar model was run for 10,000 samples on several different models of Sun SparcStation: IPC, Sparc 10, Sparc20 and Sparc Server, all running SunOS 4.1.3 operating system; and an Ultra II running Solaris 2.5. The GRM library functions were compiled on the Sun using the Gnu C compiler whilst the CGS library functions and the simulation CGS code were compiled with the SunOS bundled C compiler (cc). Table 7-3 shows that, in general, the faster the processor, the greater the relative overhead imposed by large numbers of I/O files.
7 .5.2
Platform
I/O Configuration 12 I/O files 2 I/O files
Sun IPC 295.3 s 218.0 s
Sun SparcServer 155.2 s 110.3 s
Sun Sparc 10 98.8 s 67.4 s
Sun Sparc20 46.6 s 27.2 s
Sun Ultra II 26.0 s 15.0 s
Table 7-3: Simulation times for Sun workstations
Sky Computers
Sky Computers market a series of products based around the i860 processor. A single