PHONETIC DESCRIPTION - A novel architecture for a high performance low complexity neural device

Production

Richard Palmer Phd.Thesis

new untaught speech, and to provide a more robust performance.

The results of this work concluded that, with a limited set of taught data (digits 0-9), a good internal representation had been produced. The speech resynthesised was intelligible, if a little simplified, and the network was also able to generalise onto new untaught data.

3.5.1.2 Fundamental Period Extraction Network

A Fundamental Period Extraction Network had been developed, this was taken up by the EPI Group to provide a basis for the next generation of hearing devices, aimed at superseding SiVo and Microstim. The design of this network was to enable an estimate of the voiced fundamental period (Tx) to be made, and for it to work under non-ideal situations, with good background noise rejection[i5].

This has been carried out using the multilayer perceptron algorithm, with a network being made up of four layers: an input layer, two hidden layers and an output layer. Diagram 3.7 shows this topology, and the input window arrangement used.

This input window is used to form the temporal context required for accurate Tx extraction. It is 20.5ms wide, and is constructed from 41 0.5ms wide slices. Each slice is made up of six energy values, corresponding to six different frequency bands, resulting in a total of 246 input units.

The two hidden layers both consist of six units. This was decided upon by simulation: ten units had first been used, but it was found that six units showed a very similar performance. This method of deciding upon the number of units to use in a network shows up the rather ad hoc methods still employed in this area of neural networks.

Only one unit is required in the output layer - this provides the Tx information. This information is then further processed to generate the required sinusoidal output for the device.

Richard Palmer Phd.Thesis FULLY C O N N E C T E D 6 F r equency Bands I N P U T WI N D OW 41 Fr ames Wi de 2 0 0 0 - 3 0 0 0 H z 1 2 0 0 - 2 0 0 0 H Z 9 0 0 - 1 2 0 0 H z 6 0 0 - 9 0 0 H Z Ou t p u t Layer 2nd Hi dden Layer 6 H I DD E N U N I T S f u n d a m e n t a l p e r i o d 1 O U T P U T U N ! T 6 H I D D E N U N I T S

Diagram 3.7 Tx Extraction Network

period being recorded by use of a Laryngograph. A Laryngograph is a

pick-up that is attached to the throat, which enables a recording to be made of the vocal cord activity. This recording of the vocal cord activity was presented to the output layer during learning.

The initial results obtained showed good recognition, and great

robustness when external noise was introduced to the input. It was discovered that the best noise rejection occurs when noise is added to learning data, since this enables a better generalisation of a noisy signal to be constructed.

The network proved to be so robust that it was even able to improve on the Laryngograph recording of the initial speech. When tested, using the same data it was taught on, the network could pick out extra vocal cord closures that were originally missed on the Laryngograph recording.

I V L M I C I I U r c L L I U C L 1 11U • x u c o x o

3.5.2 Real-Time Implementation for Tx Extraction Network

Work has been carried out to provide a real-time implementation of this multilayer perceptron network[i6]. The system developed consists of a TMS320C25 Digital Signal Processor (with on-chip RAM), EPROM, A/D converter and various support components. The device fits on a printed circuit board measuring 7cm by 6cm. The board executes at 32 MHz and has a power consumption of 400mW. This gives about 12 hours life with lithium batteries.

The key points in the choice of the processor are defined as below:

•Single cycle Multiply-Accumulate and Data-Move Instruction •Integer processing (To cut down on power consumption) •Must be CMOS for low power consumption

•As much on-chip memory as possible

These points need no further explanation, with the exception of the

Single cycle Multiply-Accumulate and Data-Move instruction. This

instruction is used to implement the input window that is required in the Tx Extraction Network.

Most digital signal processing chips use a variety of different addressing modes to simplify the implementation of signal processing models, and to ensure the high throughput of data through the processing unit. Since digital signal processors now incorporate multiply and accumulate instructions that can occur in a single clock cycle, they require this processing power to be matched with the speed of the operand fetches. This puts a very high demand on the address generation that is required for many signal processing models that are commonly used.

For the implementation of input windows, two methods are widely used:

Modulo Address Generation

Fetch - Operate - Data Move Instructions

Richard Palmer Phd.Thesis

thus ensuring the maximum throughput of data through the processing units. These two methods can be seen on the two most common families of digital signal processors:

Motorola DSP56000

In the Motorola DSP56000[i7] a separate hardware block is used for address generation, this is termed the address generation unit (AGU). This unit operates in parallel with the multiplier, and can be used to

implement a variety of address modes. The address mode that would be most suitable for the implementation of the Tx Extraction Input Window would be the Modulo Address Generation.

•Modulo Address Generation

This address mode allows circular buffers to be created inside a memory block. These buffers must fall on a binary base address, and be of a binary divisible length. This mode allows FIFO buffers, sample windows and delays to be easily programmed.

Diagram 3.8.a shows the implementation of an input window using

In document A novel architecture for a high performance low complexity neural device (Page 43-47)