• No results found

Software Design

continuous speech reconition

5.8 Software Design

The first stage in designing the system was that of developing a pre-liminary prototype using MATLAB. This prototype consisted of various M-file implementations of the processing blocks described in Chapter 3. After the preliminary design and testing using MATLAB, the system was designed and developed from scratch using C++ in a Linux environment (Fedora Core 8).

5.8.1 MATLAB Design

This section contains all functions which were used to develop the system design using MATLAB but these functions were developed for signal processing and VQ steps only.

asr.m this was the main speech analysis script, it performed the following:

1- It uses analoginput() to get input from the user via the microphone.

2- It also uses wavread to read from the recorded files and build the codebook.

2- It contained a general program that calls functions to speech detection, pre-emphasis, frame blocking, quantization and codebook generation.

50 2- It performs the windowing manually without calling any external function.

myVAD.m

It performs speech endpoint detection on an input signal using an end point detection algorithm described in chapter 3, with a flowchart given in figure 5.2.

preemp.m

Contains a function that performs emphasis on an input signal with a given pre-emphasis parameter (a).

frameblock.m

It performs frame blocking on the pre-emphasized signal, with a given overlap. It returns a matrix whose rows are the set of frames, and number of columns is the number of samples per frame.

mylpc.m

It takes an input signal, and the order of LPC analysis. It calculates the LPC coefficients using the Levinson-Durbin algorithm.

cepsco.m

It uses the output LPC coefficients to compute the cepstral coefficients which are the distortion measures.

buildcb.m

It uses the K-means Lloyd algorithm (described in chapter 3 and in figure 3.8) to build a codebook of reference vectors from a training set of vectors.

distanc.m

It computes the weighted cepstral distance between two given vectors by linear weighing. [Note: In the final project, the weighted cepstral coefficients were used as described in chapter 3]

51

quantize.m

It takes a set of reference vectors and an input vector, and returns the row index (index of reference vector) that is closest to the input vector.

5.8.2 C++ Implementation

The system was designed using an Object Oriented approach, and was implemented in C++ using GCC (GNU Compiler Collection) under Linux. The code was developed using the Kdevelop IDE. The system design consisted of the following classes:

1)HMM Class

The HMM class is the main class used in the system implementation. It provides a complete implementation of discrete HMMs with discrete outputs. It consists of the following components:

a) Simulation Functions: Used in generating outputs from a Hidden Markov Model, these include: init(), transit(), generate(), getCurrent()

b) Computation Functions: Used to calculate the alpha and beta values and compute the output probability of a given observation sequence.

c) Re-estimation Functions: Used to train the model. These functions use the scaled alphas and betas to re-estimate the model parameters and call other functions (not provided as an external interface). It also contains functions for Multiple Observation Sequence Training, and parameter estimation (Contains 4 different types of

initializations for models).

d) Viterbi Functions: Used to decode the most likely state sequence from a given observation. There are two versions, with the other version implemented using logarithms to avoid overflow/underflow.

2)Matrix Class

It provides the following interface:

a) Utility Functions

i) Provide functions for loading a matrix from file, and saving it to file.

52 ii) Provide functions for dereferencing elements in a matrix, and modifying existing elements.

iii) Provide functions for loading a wave file into a matrix using a wavread() function in coordination with the WaveIn class.

b) Mathematical Functions

i) Provides functions for addition, subtraction, division, and normalization.

c) Modeling Function:

i) K-means Lloyd codebook generation

ii) Quantization of a matrix from another matrix' rows 3)WaveIn Class

It reads a WAVE file using 8-bit, or 16-bit coding, and with sampling frequencies in the range 8 kHz to 22.5 kHz using standard WAVE file encoding. It reads a file by decoding the header, and skipping it to read the appropriate payload data.

4)WaveOut Class

It reads an input array and creates the appropriate WAVE file header, and saves the result into a WAVE file with a prescribed sampling frequency, and sample size.

5)List Class

Contains an implementation of a multiple-linked list data structure using templates so that it may be used with any data type. It is used with VoiceBase, and in creating a training set because such data is too large to be contained in an single array. [Note: It can be used in future work to replace Matrix's array implementation for large

vocabulary speech recognition].

6)Node Class

Is a template class that is used in coordination with list class for implementing the linked list using templates.

7)VoiceBase Class

It manages the set of words available into a database, and interfaces with the main program to create HMMs, recognize words, and train models.

53 8)VoiceField Class

It contains the data necessary to recognize a single word. It contains information such as the number of training elements, the model size, and the appropriate codebook. It also contains information to be used in future implementation (such as the number and type of phonemes), for automatic model generation.

The system also included functions to perform various signal processing tasks; some of the most important tasks are shown below:

1)Speech detection – trim() 2)Pre-emphasis - pre-emphasize() 3)Frame blocking - frameblock() 4)Windowing - window()

5)LPC coefficients – lpc() and glpc()( Levinson Durbin and Gaussian{ Redundancy }) 6)Cepstral Coefficients – cepsco()

7)Cepstral Weighing - wcepsco()

The remainder of the system (main()) was a simple implementation that used the above classes to test the system, and evaluate its performance under various conditions. A simple program output is shown in appendix A-1

Related documents