Im plem entation D escription - DSP Implementation

GLT^{t,r,a) = and

Chapter 5 DSP Implementation

5.1 Im plem entation D escription

A minimal version of the Multi-channel G radient Model has been implemented using current image processing hardware. The M atrox “Genesis” general purpose image processor has been selected as the best system currently available for real-time image

capture and fast processing. The processing architecture is built around a Texas

Instrum ents G80 parallel master processor which is supplemented with a “Neighborhood O peration Accelerator” (NOA) chip th a t accelerates operations such as convolution. This yields an overall processing power of 2.2BOPS (Billion Operations Per Second). The Genesis also has an image capture module th a t contains dedicated hardware capable of performing image scaling and subsampling w ithout using any of the master processor’s computing time. Since we cannot run even the minimal algorithm on full sized images (768x576) we will need to use the sub-sampling functionality th a t this board offers. The Genesis has 64Mbytes of lOOMhz processing memory and 8Mbytes of display memory. An

on-board display module enables the results to be viewed w ithout any overhead incurred by the host processor transferring data to a separate video display device. The available processing memory is significantly more th an required since the images used in this im plem entation are not larger th an a quarter of the standard PAL format (% 768x576), of

which nearly 1300 16-bit images can be stored into the available 64Mbytes. This leads us to state th e first of our im plementation criteria;

• Execution speed should take priority over memory usage.

Instructions are issued to the Genesis board from the host PC (SSSMhz P entium ll w ith 128MB of memory), which are then executed asynchronously. Although the M aster processor contains a relatively fast lOOMhz floating-point arithm etic unit (FPU), it is highly optimised for processing 8-bit, 16-bit or 32-bit integer values. Additionally, the

d ata bandw idth from the on board memory to the processor is 400Mbytes/ second, which although fast, is not always able to keep up with the processor and therefore limits execution speed. Since it is possible to transfer twice as many 8-bit pixels per second th an

16-bit pixels, and twice as many 16-bit pixels per second than 32-bit pixels we can set another im plem entation criterion:

• The image d ata should be reduced to the smallest possible integer size to enhance throughput.

Perhaps the most critical implementation issue is th a t an overhead of 0.5ms is incurred when calling a Genesis processing function, this time penalty is incurred during every image processing function call made to the Genesis. This overhead is due to the tim e required to re-configure the master processor an d /o r the NOA chip, which is a typical problem encountered with re-programmable hardware. This overhead effectively limits the hardware to performing a maximum of 2 0 0 0 image processing operations per

second, even if the operations could hypothetically take zero time to execute. This equates to only 80 image processing operations per frame at 25 frames per second. The overhead becomes more im portant as image size decreases, so although the Genesis is very efficient for large images, its efficiency is degraded when using small ones. One of the most im portant efforts will therefore be to structure the image d ata such th a t image processing

operations can be performed on large groups of smaller images in one function call. Thus we form two further implementation criteria:

• The number of image processing operations should be kept to an absolute minimum.

• Images th a t require the same processing operation should be grouped into larger parent images.

In this implementation, two parallel threads of execution are created on the host PC, with a further two command threads on the Genesis board. The first host thread services the user interface, which in turn can alter variables in the model. The second thread, which has priority, is responsible for both issuing image processing commands to the Genesis board and processing d ata on the host. Commands can be issued by the host to one of the two threads on the Genesis, the first is used for asynchronous camera frame grabs while the other used to execute the image processing instructions (figure 5.1).

The processing has been broken into separate blocks of image processing functions, called sequentially from a m aster processing loop running on the host. The individual subroutines in the processing box in figure 5.1 are shown in figure 5.2 and this chapter explains how each subroutine is implemented, according to the im plementation criteria set out above.

Interface Thread Processing Thread

Genesis

Genesis Grab Thread Genesis Process Thread

Processing Image Grab

User Interface The Model

Host Application

F ig u re 5.1 P a ra lle l p rocessing th r e a d s o n th e h o st P C a n d th e G enesis im age p ro cessin g b o ard . Im age ac q u isitio n a n d pro cessin g occur sim u lta n e o u sly on th e G enesis. T h e P ro cessin g bo x is e x a p a n d e d in figure 5.2.

Create Basis Set

Display Steering Taylor Construct Taylor Products Quotients Temporal Filter Weights Velocity

F ig u re 5.2 P ro cessin g su b ro u tin e s ex e cu ted on th e G enesis in th e processing th re a d . T h e final su b ro u tin e s (Q u o tie n ts a n d V elo city ) c a n b e ex e cu ted on th e H o st if req u ired . T h e se ro u tin e s c a n ex e c u te in d e p e n d e n tly to th e im age

In document A Real-Time Implementation of a Neuromorphic Optic Flow Algorithm (Page 109-113)