Partitioning / Mapping
Scheduling
Simulation Execution on
Real Hardware Analysis
Target Hardware Application Data Flow Graph
Candidate Architecture(s
Modify Modify
Modify Modify
Modify
Virtual
Prototyping Autocoding
GEDAE
TM- A Graphical Programming and Autocode Generation Tool for Signal Processor Applications
Harris Z. Zebrowitz
Lockheed Martin Advanced Technology Laboratories 1 Federal Street
Camden, NJ 08102
Introduction
GEDAE
TMis an advanced graphical development and automatic code generation tool, which is revolutionizing the design and development of digital signal processing systems. It enables designers to capture signal-processing applications in a hard- ware-independent graphical representation.
Designers can then partition and map the application to a variety of commercial multi- processor embedded hardware architectures and generate real-time software using target specific vendor supplied vector libraries. For truly embedded systems, the application can be controlled from an external program independent from the development environ- ment. The GEDAE
TMvisualization tools display all hardware and software activity on the target embedded system, including processing, interprocessor com-munications, and buffer activity, enabling a level
of optimization equivalent to or surpassing that achievable through hand coding. This paper describes the capa-bilities and features of GEDAE
TM.
Design Process
The design process for designing embedded signal processors using GEDAE
TMis shown in Figure 1.
The design process begins with an
architecture-independent data flow graph representing the signal processing algorithms for the intended application. The function boxes or nodes in the graph represent a processing function, such as FFT or FIR filter function. The lines in the graph represent data flowing between the nodes.
The graph nodes are mapped to the multiple processors in the architecture, and perform- ance estimates are generated by simulation.
This allocation is initially performed using engineering judgement but it may be modified as virtual prototyping trade-off studies proceed.
Virtual prototyping provides the ability to simultaneously simulate the hardware and software design prior to building the hard- ware. Virtual prototyping is used to evaluate
Figure 1. GEDAE™ Design Process
alternative approaches to partitioning of the processing nodes and mapping of partitions to processors. Once a satisfactory partition- ing and mapping scheme is determined, the architecture independent data flow graph is transformed into an architecture specific set of software executables by autocoding.
Autocoding is the process of generating software automatically from the partitioned and mapped data flow graph. The functions executed by the nodes in the graph are reusable library elements. The function libraries are completely architecture inde- pendent and are converted by autocoding into architecture specific optimized vendor specific library calls. Both processing function and communication is autocoded from the data flow graph representation. As depicted in Figure 1, the design process is iterative.
Design Environment
GEDAE™ provides a unified graphical environment to develop signal processing systems. Typical process improvements are summarized in Figure 2. GEDAE™ is com- prised of a workstation development en- vironment and target specific run-time kernels for embedded targets. The work- station development environment provides the capability required for developing data flow graphs and validating their function- ality. Included is support for mapping the data flow graph to multiple processors, auto- coding the application to run on those proc- essors, and visualization of performance.
Cost/Performance Improvement
Development Time >5x
Integration and Test Time >10x Processor/Memory Efficiency ~1x
Figure 2. GEDAE vs. Conventional Process Improvement
The user environment is common to both workstation and embedded multiprocessor
applications so it is not necessary to switch tools when moving from algorithm develop- ment to the generation and optimization of code for embedded systems. The application developer never needs to write any inter- processor communication software for multiprocessor implementations. In fact, this may be the greatest benefit of graph based programming for multiprocessors, because multiprocessor communication is respons- ible for most of the debugging problems for large applications.
Algorithm Capture: Algorithms are captured in GEDAE™ by placing processing function boxes extracted from a library on a work area and interconnecting them. Designers can create graphs by selecting from a large library of standard functions. Templates are provided to create new library primitives and new data types. Custom primitives are created using standard C syntax. Graphs can be hierarchical to any depth required by the application. Unique graphical syntax con- sisting of families and route boxes support succinct description of parallelism. The algebraic description of graphs via para- meterized families and routing enables automatic graph restructuring to support parallelism. The upper left corner of Figure 3 shows an example of a GEDAE™ flow graph.
Data Flow and Functional Validation:
Execution of data flow graphs is controlled
through the same interface used to construct
the graphs. There are several ways to
observe the execution of a graph from both a
hardware and software perspective. There
are dynamic displays that let users see what
is occurring while the graph is executing,
and static displays that collect detailed infor-
mation in the background for subsequent
Figure 3. Example Application Graph
Generated Using GEDAE™
analysis. Scopes, such as shown in the upper right corner of Figure 3, and monitors can be inserted into a graph to facilitate observa- tions during execution.
Event timing data can be collected in the background while a graph is executing and the information stored until the Trace Table display is requested by the developer. The Trace Table, shown in the lower left corner of Figure 3, contains detailed time line information for system analysis.
Virtual Prototyping: CSIM is a C language based virtual prototyping tool that is currently being integrated with GEDAE™.
CSIM provides a natural and powerful description of a parallel processor algorithm mapping on a described architecture. It can describe the function of each device in a system in terms of time delays for computation and I/O and its interaction with the rest of the system. It can support inter- connecting the models of each device according to arbitrary topologies and run discrete event simulations of the described system. Finally, using the resulting system model, CSIM can be used to investigate the
effects of link bandwidth in conjunction with the network architecture (buses, rings, meshes, etc.) and used to inves- tigate the performance of algorithm mappings onto the modeled architectures. The com- pleted interface of CSIM and GEDAE™ will permit a user to develop an application, establish correct functionality, graphically define a virtual architecture, map the application to the virtual architecture, predict performance on the vir-tual architecture, and autocode the parti-tioned and mapped system for execution on the target hardware.
Embedded Code Generation: Once the data flow and functionality have been verified and a partitioning and mapping scheme have been determined, GEDAE™ generates the execution schedules for each of the embedded processors. As shown in the lower right corner of Figure 3, the mapping table is used to specify partition assignments.
The schedule generation process maximizes
the use of static scheduling to minimize
overhead, but it preserves dynamic behavior
where required. A schedule may be divided
into multiple sub-schedules, which may all
operate at different firing granularities to
optimize performance. The code is then
automatically compiled, linked, loaded and
executed on the embedded hardware. The
library functions used to construct the graph
are linked to the optimized math library
provided by the hardware vendor to achieve
optimum performance. A Run-Time Kernel
residing on each of the embedded processors
supports the execution of the autocoded
application.
Schedules can be viewed using the Schedule Display. Schedules are presented with the graph functions listed down the left side of the table in their order of execution. For each entry in the Schedule Display, memory information and execution time is presented.
When executing on multiple processors, the Trace Table reflects the presence of addi- tional processors and the fact that commun- ication occurs between them. Computation time, data flow activity (queues filling and emptying), and communication (sending, receiving and local memory copies) are all detailed in the Trace Table.
Optimization: The types of optimization that are supported for embedded execution include interactive partitioning and mapping, memory usage, communication mechanisms selection for inter-partition links, schedule firing granularity, queue capacities, and scheduling options. The group control dialog is the interface to all optimization mechanisms that give designers control over the optimization and execution characteristics of applications and assist the designer in attaining optimized performance for the application.
Stand-Alone Operation: Embedded appli- cations must be capable of execution independent of a workstation and display.
GEDAE™ enables autocoded applications to be targeted for stand-alone operation. To support this mode of operation, GEDAE™
provides a software API that facilitates controlling graphs from other software such as higher level control software. The API provides a set of functions that may be called to start and stop graphs, set para- meters, read and write data to the graph, and connect graphs to other graphs. These capabilities provide the ability to develop applications using the analyses facilities of the development environment with the
capability to divorce the application from that environment and control it from external software.
Currently, GEDAE™ provides a set of functions used to instantiate, control, and configure the application graph. Future improvements will extend support of control software development to include some fine grain control in GEDAE™ autocoding. A prototype tool known as Application Inter- face Builder (AIB), which autocodes control software, has been developed. Near term efforts on control software autocoding will focus on the refinement of the AIB tool with the intent of incorporating the tool into the development environment. Longer-term efforts include the development of graphical methods for specifying the control software and providing co-simulation with data flowgraphs.
Demonstration and Benefits
GEDAE™ has been shown to provide many benefits, including increased productivity and easier application retargeting, which provides the ability for designers to leverage the hardware technology curve.
Rapid Prototyping/Portability: A synthetic aperture radar (SAR) application was originally hand-generated for Mercury Computer Systems’ RaceWay architecture and then was re-implemented using GEDAE™. The resulting autocoded appli- cation achieved the same execution and memory efficiency as the hand-coded version with about a 10X reduction in implementation time. The same GEDAE™
application was correctly remapped to
several different commercial signal
processing architectures including Mercury
PowerPC, Sharc and I860, Ixthos Sharc and
Alex Sharc by simply repartitioning and
remapping the application to the new
architecture. These remappings were accomplished in hours.
Re-Use of Legacy Software: A fifty thousand-line sonar algorithm, developed by the Navy, was converted into GEDAE™
data flow graphs in less than twelve weeks.
Once converted, the application was distrib- uted for real-time operation on a Mercury PowerPC architecture. Test, integration, and optimization on the target architecture took four weeks.
Optimization of Large Systems: The Semi- Automated IMINT Processing (SAIP) application utilized 4 Alex Computer Systems Sharc boards with 18 Sharcs per board to meet real-time performance requirements. As depicted in Figure 4, the GEDAE
TMvirtual prototyping and autocod- ing process enabled efficient implementation of this 72-processor system. Detailed virtual prototyping verified HW/SW mapping and network communication bandwidth per- formance, and it established the final executable timing and memory specification. In the final design, Sharc
memory was over 90% utilized, as was the processor loading. The utilization of virtual prototyping and auto-coding the SAIP benchmark delivered a 100x improvement in throughput density and reduced the hardware cost enough to offset development costs for the first system.
Summary
A hardware/software codesign methodology utilizing virtual prototyping and autocoding tools reduces system costs. Productivity improvements of 5x in software develop- ment and 10x in integration and test have been demonstrated. Such improvement lead to lower system cost and faster time to market. Improved application portability and retargetability significantly reduce the cost of migrating applications from one hardware platform to another and provide the ability to easily leverage the hardware technology development curve. Because communication software is automatically generated, retar- geting applications to new hardware and reoptimizing can be achieved in weeks or even days.
Figure 4. Virtual Prototype System
System Architecture Model
Alex SHARC Board Model
Alex SharcPac Model
HighClass Data Flow Graph Software Model
Final System Hardware Configuration