Model for Optimization of Speech Recognition and Performance Analysis

(1)

Model for Optimization of Speech Recognition

and Performance Analysis

1

Dr .Kadam Vipulsangram K,

2

Zine Jyoti P

1,2_{Dept of Electronics Engg., P.E.S College of Engg., Nagsenvan, Dr. Babasaheb Ambedker Marathwada}

University, Aurangabad, MS, India

Abstract

Our research work explore the possibility of the optimization of

speech recognition tool. By introduction of BP Digital filter at

the input section of LPC section from which we receive slight enhancement in the Mainlobe Width from 1.4025 dB to 1.5016 dB with Sidelobe Attenuation as 1.5 dB. This implies the rise & optimization of 7.06 % power of desired speech signals. Then we have tried for LPC Tuning (varying the parameters of emphasis

& de-emphasis digital Filter) For the numerator coefficients as 1, .9999 for emphasis & 1.9000 for de-emphasis filter we observe

that there is again increase in the Mainlobe Width as 1.7269 dB. This implies the rise & optimization of 23.13 % power of desired

speech signals.Finaly we varying the order of the filter in this we have design the 40 order filter which gives the mainlobe width as

2 dB which gives the more enhancement the Width of Mainlobe. This implies the rise & optimization of 42.60 % power of desired speech signals.

Keywords

LPC, Mainlobe, MFCC, OTSR

1.1. Methodology

The work is focused on designing of Optimization Tool for

Speech Recognition (OTSR) by applying some modifications in the basic model of LPC & with the addition of DSP filters , further to implement on a DSP hardware which will give the better performance in noisy environment with high efficiency for

Digital Transmission.

1.2. Contribution of this Model

1.2.1. Simulink Model

The block diagram below shows the system we implement Figure 1.1shows block diagram of methodology used to design

of OTSR. Digital Filter Pre-Emphasis Overlap Analysis Window, Autocorrelation, Levinson -Durbin Time Varying Analyzing Filter Time Varying Synthesis Filter De-Emphasis Digital Filter LPC OUT Re-Synthesized Signal Reflection Coefficients Digital Filter Imported speech samples

from this PC

P ad hamming Window Digital Filter K In Out

T ime-V arying S ynthesis Filter Digital

Filter K In

Out

T ime-V arying Analysis Filter Waterfall S cope R esidual Waterfall S cope R eflection Coeffs y_lpc R e-synthesized S ignal Digital Filter P re-E mphasis

Welch P eriodogram1 Welch P eriodogram Overlap Analysis Windows 1 u L evinson-_DurbinA

K L evinson-Durbin

Waterfall S cope

L P C S pectrum F10.mp3

A: 8000 Hz, 16 bit, monoAudio

From Multimedia File

FFT Display1 Display FDAT ool Digital Filter Design Digital Filter De-E mphasis ACF Autocorrelation

Fig. 1.2: The block diagram shows the system implement using software

Here you implement a speech compression technique known as Linear Prediction Coding (LPC) using DSP System Toolbox™ functionality available at the MATLAB® command line.

1.3. Comparison Results & Performance Analysis

Following are the three main contribution results of our system

1. Introduction of BP Digital Filter at the Input Section

Here we have introduce the Band Pass digital filter with the sampling frequency of 20 KHz, lower cutoff as 60 Hz,10 order

& higher cutoff as 8 KHz. From the table no.1.5 which shows the average of width considering the 80 speech samples of male

& female students, in this we have receive slight enhancement in

the Mainlobe Width (-3dB) from 1.2723 to 1.3674 with Sidelobe Attenuation as 1.5 dB. This implies the rise & optimization of 7 % power of desired speech signals as per the mainlobe is concern. Also from table no.1.6 which shows the average signal to noise ratio considering 80 speechsample signals of male &

female students, in this we achieve the enhancement in signal to

noise ratio from 0.0024 dB to 2.1123 dB.This implies the rise & optimization of 91 % power of desired speech signals as per the signal to noise is concern.

2. LPC Tuning (Varying the Parameters of Emphasis & De-Emphasis Digital Filter)

For the numerator coefficients as 1, .9999 for emphasis & 1.9000 for de-emphasis filter From the table no.1.5 which shows the

average of width considering the 80 speech samples of male &

female students, in this we have receive slight enhancement in the

Mainlobe Width (-3dB) from 1.3674 to 1.5926. This implies the rise & optimization of 15 % power of desired speech signals as per the mainlobe is concern. Also from table no.1.6 which shows the average signal to noise ratio considering 80 speechsample signals

of male & female students, in this we achieve the enhancement

(2)

C. Varying the Order of the Filter

In this we have design the 40 order filter.From the table no.4.5 which shows the average of width considering the 80 speech samples of male & female students, in this we have receive slight enhancement in the Mainlobe Width (-3dB) from 1.5926 to 1.6641. This

implies the rise & optimization of 35 % power of desired speech signals as per the mainlobe is concern. Also from table no.1.6

which shows the average signal to noise ratio considering 80 speechsample signals of male & female students, in this we achieve

the enhancement in signal to noise ratio from 2.3017 dB to 3.4898 dB.This implies the rise & optimization of 98 % power of desired speech signals as per the signal to noise is concern.

Following tables shows the comparison of the result achieved with the available model in terms of mainlobe width (-3dB) & Signal to Noise ratio in dB.

Table 1.1: Comparison of Mainlobe width (-3dB) for male samples

Samples Mainlobe width (-3dB)

Male MFCC LPC Contribution-1 Contribution-2 Contribution-3

M1 0.4155 1.4025 1.5016 1.8462 1.9999 M2 0.5231 1.6531 1.6543 1.7257 1.6924 M3 0.7121 1.9229 1.8485 1.9999 1.6717 M4 0.6521 1.7935 1.7949 1.7908 1.6730 M5 0.4156 1.4027 1.5748 1.8948 1.7256 M6 0.0001 0.0103 0.0103 1.7859 1.9999 M7 0.3310 1.2395 1.5155 1.6546 1.7635 M8 0.0006 0.0005 0.0005 1.7598 1.7628 M9 0.4181 1.6531 1.7903 1.7906 1.7114

Table 1.2: Comparison of Mainlobe Width (-3dB) for Female Samples

Samples Mainlobe width (-3dB)

F1 0.0002 0.0006 0.0006 1.7584 1.9999 F2 0.3110 1.5154 1.6532 1.6534 1.6602 F3 0.0002 0.0005 0.0005 1.4634 1.9999 F4 0.4010 1.6778 1.7563 1.6561 1.7330 F5 0.4110 1.6529 1.8908 1.7905 1.7133 F6 0.0002 0.0006 0.0006 1.6047 1.7324 F7 0.0002 0.0006 0.0006 1.8298 1.7267 F8 0.0110 0.5327 1.1214 1.5942 1.6232 F9 0.4189 1.6533 1.7350 1.6572 1.6685

Table 1.3: Comparison of Signal to Noise Ratio in dB for Male Samples

Samples Signal - to - Noise Ratio in dB

(3)

Table 1.4: Comparison of Signal to Noise Ratio in dB for Female Samples

Samples Signal - to - Noise Ratio in dB

F1 -0.4488 0.0000 1.2983 1.3092 2.8732 F2 -0.5502 0.0000 2.8631 2.9834 3.9821 F3 -0.4628 0.0010 2.1020 2.1923 3.9823 F4 -3.8711 0.0001 3.4297 3.5423 3.6681 F5 -0.4493 0.0001 3.9012 3.9834 4.9237 F6 -1.6171 0.0010 2.7634 2.8734 3.8721 F7 -1.7907 0.0001 3.9619 3.9926 4.7325 F8 -0.5012 0.0011 1.2398 1.2399 2.9831 F9 -3.0016 0.0010 1.1674 5.6869 6.5421

Table 1.5: Average of Mainlobe Width (-3dB) of Above Samples

Average of Mainlobe width(-3dB) of above samples

MFCC LPC Contribution-1 Contribution-2 Contribution-3

0.3987 1.2723 1.3674 1.5926 1.6641

Table 1.6: Average of Signal-to-Noise Ratio in dB of Above Samples

Average Signal - to - Noise Ratio in dB

MFCC LPC Contribution-1 Contribution-2 Contribution-3

-2.0574 0.0024 2.1123 2.3017 3.4898

Above table shows the comparison results achieved with available model

Following figure gives the performance analysis of available MFCC system using software.

(4)

Following figure gives the performance analysis of available LPC system using software.

Fig. 1.4: Result of Available Model Using LPC

Following figures shows the performance of our contribution.

Fig. 1.5: Contribution-1: Result of Model with introduction of Digital filter at input section.

(5)

Fig. 1.7: Contribution-3: Result of Model with varying the order of Digital Filter.

2.1. Performance Analysis of Optimization Tool for Speech Recognition using LPC & Matlab with Simulink environment:

In this we have tried to give the analysis using LPC Technic with Matlab & Simulink environment. We have received the following results as shown in following system block diagrams& output waveforms.

Fig. 2.1: LPC Spectrum using Matlab & Simulink

Fig. 2.3: Residual using Matlab & Simulink

Fig. 2.4: Signal & LPC Spectrum using Matlab & Simulink

2.2. Reduction of Noise

Following fig shows the spectrum of the output without & with

(6)

Amplitude

Fig. 2.5: Result without Digital Filter

Fig. 2.6: Result with Digital Filter

3. Concluding Remarks

We have seen here the implementation of system using software. The implementation used the DSP System Toolbox functionality available at the MATLAB command line. We have compare the achieved result with the existing models MFCC & LPC. From the comparison table we have achieved the rise & optimization of 7 % power considering the mainlobe & 91% power considering

signal to noise ratioof desired speech signals in contribution 1,rise

& optimization of 15 % power considering mainlobe & 95% power considering signal to noise ratioof desired speech signals

in contribution 2, rise & optimization of 35 % power considering

mainlobe & 98% power considering signal to noise ratioof desired speech signals in contribution 3. Also we have compare the result considering the 80 samples of male & female.

IV. Conclusion

We develop our OTSR system which will be operate in noise

environment because human beings are able to recognize speech amazingly well in high levels of background noise. On the other

hand, the performance of Automatic Speech Recognition (ASR)

systems degrades dramatically with increasing noise. Part of the reason for this difference lies in the fact that the auditory system incorporates several features that make it more robust to noise. Finally we conclude with the following

points-We have implemented both software as well as hardware 1.

model for our OTSR & we achieve the optimization of speech

recognition system.

By using Matlab & Simulink we implement the Software 2.

Model whereas the hardware Model is implemented using

Matlab, Simulink, and Code Composer Studio & Hardware

TMS3206711/6713 from Texas Instruments. In both implementations we have achieved the same result which

reflects the stability of our system.

Introduction of BP Digital filter at the input section

3.

Here we have introduce the Band Pass digital filter with the sampling frequency of 20 KHz, lower cutoff as 60 Hz,10

order & higher cutoff as 8 KHz. From the table no.4.5 which shows the average of width considering the 80 speech samples

of male & female students, in this we have receive slight

enhancement in the Mainlobe Width (-3dB) from 1.2723 to 1.3674 with Sidelobe Attenuation as 1.5 dB. This implies the rise & optimization of 7 % power of desired speech signals as per the mainlobe is concern. Also from table no.4.6 which shows the average signal to noise ratio considering 80 speech

sample signals of male & female students, in this we achieve

the enhancement in signal to noise ratio from 0.0024 dB to 2.1123 dB. This implies the rise & optimization of 91 % power of desired speech signals as per the signal to noise is concern.

LPC Tuning (varying the parameters of emphasis & de-4.

emphasis digital Filter)

For the numerator coefficients as 1, .9999 for emphasis & 1.9000 for de-emphasis filter From the table no.4.5 which

shows the average of width considering the 80 speech samples

of male & female students, in this we have receive slight

enhancement in the Mainlobe Width (-3dB) from 1.3674 to 1.5926. This implies the rise & optimization of 15 % power of desired speech signals as per the mainlobe is concern. Also from table no.4.6 which shows the average signal to noise ratio considering 80 speech sample signals of male &

female students, in this we achieve the enhancement in signal

to noise ratio from 2.1123 dB to 2.3017 dB.This implies the rise & optimization of 95 % power of desired speech signals as per the signal to noise is concern.

Varying the order of the filter

5.

In this we have design the 40 order filter. From the table

no.4.5 which shows the average of width considering the

80 speech samples of male & female students, in this we

have receive slight enhancement in the Mainlobe Width (-3dB) from 1.5926 to 1.6641. This implies the rise & optimization of 35 % power of desired speech signals as per the mainlobe is concern. Also from table no.4.6 which shows the average signal to noise ratio considering 80 speech

sample signals of male & female students, in this we achieve

the enhancement in signal to noise ratio from 2.3017 dB to 3.4898 dB.This implies the rise & optimization of 98 % power of desired speech signals as per the signal to noise is concern.

Introduction of BP Digital filter at the input section.

V. Directions for Future work

Based on the work presented in this thesis the directions for future research work couldbe –

The current system can be improved which will accept input both audio as well as video signals. This can be achieved by using advance processors from Texas Instrument considering LPC section also.

Work can be carried out on real time images, Video Signal, Audio

Signals obtained from the satellite.

(7)

References

[1] Ahmed, M.S. Dept. of Syst. Eng., King Fahd Univ. of Pet. & Miner., Dhahran,“Comparison of noisy speech enhancement algorithms in terms of LPC perturbation”. Acoustics, Speech and Signal Processing, IEEE Transactions on Date of Publication: Jan 1989, Vol. 37, Issue 1, pp. 121 – 125. [2] Ying Cui; Takaya,“Recognition of Phonemes in a Continuous

Speech Stream By Means of PARCOR Parameter InLPCVocoder”, Electrical and Computer Engineering, 2007. CCECE 2007. Canadian Conference on Digital Object Identifier:10.1109/CCECE.2007.402 Publication Year: 2007,

pp. 1606 – 1609.

[3] McLaughlin. M.; Linder, D. Carney. S,“Design and Test of a Spectrally Efficient Land Mobile Communications System Using LPC”. Selected Areas in Communications, and IEEE Journal on Vol. 2, Issue 4. Digital Object Identifier:10.1109/ JSAC.1984.1146086 Publication Year: 1984, pp. 611 –

620.

[4] Keshavarz, A.; Mosayyebpour, S.; Biguesh, M.; Gulliver, T.A.; Esmaeili M,“Speech-Model Based Accurate Blind Reverberation Time Estimation Using an LPC Filter”, Audio, Speech, and Language Processing, IEEE Transactions on Vol. 20, Issue 6 Digital Object Identifier: 10.1109/ TASL.2012.2191283 Publication Year: 2012 , pp. 1884

–1893.

[5] Bhattacharya, S.; Singh, S.K.; Abhinav, T,“Performance

evaluation of LPC and cepstral speech coder in simulation and

inreal-time”, Recent Advances in Information Technology (RAIT), 2012 1st International Conference on Digital Object Identifier:10.1109/RAIT.2012.6194531 Publication Year: 2012, pp. 826 - 831

[6] Fliege, N.J.,“Mulitrate Digital Signal Processing”, (John Wiley and Sons, 1994). pp. 120-143.

[7] Mitra, S.K.,“Digital Signal Processing”, (McGraw-Hill,

1998). pp. 234-245.