Automatic Control of Instruments Using Efficient Speech Recognition AlgorithmAbhishek Thakur, Rajesh Kumar, Amandeep Bath, Jitender Sharma

(1)

IJEEE, Vol. 1, Spl. Issue 1 (March 2014) e-ISSN: 1694-2310 | p-ISSN: 1694-2426

Automatic Control of Instruments Using Efficient

Speech Recognition Algorithm

Abhishek Thakur1, Rajesh Kumar2, Amandeep Bath3, Jitender Sharma4

1,2,3,4

Electronics & Communication Engineering Department, Indo Global College of Engineering, Punjab, India

1_{[email protected],}2_{[email protected],}

3_{[email protected],}4_{[email protected]}

Abstract- Matlab straight forward programming interface

make it an ideal tool for Hindi Key word Recognition. For the extraction of the feature, Hindi Key word database has been designed by using the Matlab 7.5. The database consists of the eight key words. Each key word has been stored in database by the ten speakers, eight male speakers and two female speakers consist of total 80 samples for eight commands. Features of the speech signal which are extracted in the form of MFCC coefficients and Dynamic Time Warping (DTW) has been used as features matching techniques. This thesis presents the technique to detect utterance using end point detection, MFCC to extract features and DTW to compare the test patterns. The recognition results are tested for clean and noisy test data. The system can be said to be robust as average accuracy for clean data is 97.50 % while that for noisy data is 91.25 % or above is acceptable since most people would not mind repeating a command to the system one out of ten times or less. The system can be implemented using one of the common microcontrollers with a small amount of dedicated memory and an analog to digital converter to accept the input speech. The system would be fast, small and cost efficient to be incorporated into a wide variety of consumer electronics. The aim of this thesis is therefore to develop a speaker dependent, isolated word, limited vocabulary speech recognition system that is small enough to fit in a small household appliance and that can be operated in real time.

Index Terms- Automatic Speech Recognition (ASR), Mel

Frequency Cepstral Coefficient (MFCC) and Dynamic Time Wrapping (DTW)

I. INTRODUCTION

Although many systems exist for speech recognition, none of them address the needs for consumer level applications. In order for a system to be incorporated in the everyday needs of a consumer, the system must be speaker independent, fast, low cost, require no training and small enough to be fit inside a consumer appliance. Such a system will move speech recognition from the domain of the academic or industrial application to that of a common home user. The above system can be implemented using current technology once a certain number of compromises are made. For example, let's say a speech recognition system is to be developed so that it can be incorporated into a home microwave oven. One can immediately see that there is no need to have a 60,000 word

vocabulary for such a system, a dozen words including the digits are sufficient for its operation. The system could be further simplified if one does not allow the user to change the number of words in the vocabulary. The Second aspect of the system is that it does not have to accept continuous speech. For example, a common command may be "Move.... Forward.... Fast.... Start. Proposed design for home automation system and Matlab based Hindi key word speech recognition system is for disabled persons, as they are not able to move from one place to other and can‟t locate switches. This paper attempt to provide them solution, by sitting on wheel chair or bed they can switch on and off home appliances and also control internal parameters like wheel chair direction, fan speed, heater temperature. Physically challenged persons find difficulty in power ON/OFF their home loads such as fan, light, AC etc. they require an attendee to do these things. In the absence of the attendee their world seems to be more difficult. This design helps the person with physical disability and elderly to navigate easily within their home in a wheelchair by giving voice commands. [3-5] designed for navigation of robot and forklift by giving voice commands. Some of the voice based design uses a voice recognition chip with integrated or interfaced memory chip that has a drawback of having limited number of voice commands. The reported design Speech Recognition Based Wireless Automation of Home Appliances for Disabled Persons involves automation home loads by giving voice commands in a wireless environment.

II. SYSTEM OVERVIEW

(2)

www.ijeee-apm.com International Journal of Electrical & Electronics Engineering 17

microcontroller controls the various applications upon receiving the input from the software. The relays are controlled on the ports of microcontroller to activate a particular appliance connected to the particular port.

Fig. 1: Microcontroller Interfacing.

Automatic speech recognition system and home automation system port connection with external peripherals is shown in Table 1. All peripheral are connected to corresponding port pin of microcontroller (89C52) as given in Table 1. These peripherals work according to our program and discussed in software design section. When command word given by user through microphone it is recognized by proposed algorithm and ASCII code will be generated. These ASCII code given to 89C52 microcontroller, if recognized code match then appliance will perform particular operation related to that key word.

TABLE 1: MICROCONTROLLER PORT CONNECTION

As we can see in table 1, if AAGE key word recognized then Port 1.7 goes logic one and Port 1.6 goes logic zero. Which means that robot moves in forward direction. The logic one and logic zero position of the port is shows in table 1 for corresponding key word.

III. HARDWARE DESIGN

a) Voice processor:

Next stage is voice processor stage consisting of .m voice processor file. After comparison in voice processor data is send to microcontroller for control or driving action, we are using RS232 as application communication protocol. The whole process goes in the following manner e.g. if we say AAGE key word the action related to “AAGE key word” has to performed and if we say “PICHE key word” then the action related to PICHE key word has to be performed. As shown in figure 2 when we say key any word the microphone takes analog signal and converts it to the electrical signal then attenuation of the signal is performed by the attenuator. Attenuated signal is transferred to the voice processor, these files are executed and an ASCII code is then transferred to the microcontroller through the RS 232 standard communication protocol. In this manner the voice will hold the control action of the machine or the electric appliance.

Fig. 2: Speaker recognition process.

b) Temperature sensor circuit:

We can use wide range of supply voltages lies between single supply 3 V to 30 V (LM2902 and LM2902Q 3V to 26V), or Dual supplies. Common mode input voltage range includes ground that allow direct sensing to near ground. The low supply current drain is independent to the supply voltage 0.8 mA Typ. Low input bias and offset parameters includes input offset voltage 3 mV Typ. input offset current 2 nA Typ. input bias Current 20 nA Typ. differential input voltage range equal to maximum rated supply voltage 32 V open loop differential voltage amplification 100 V/mV Typ.

Fig. 3: LM 35 Interface.

c) Analog to digital converter:

Analog to digital converter device is a high current four channel driver designed to accept standard DTL or TTL logic levels, monolithic integrated high voltage and drive inductive loads (such as relays solenoids, DC and stepping motors) and switching power transistors. To simplify use as two bridges

S.N. Ports of 89C52

µc

Hardware Devices Control

Hindi Key Word

1 P1.0, P1.1, P1.2 ADC BAND

2 P1.4 Temperature 30 deg. TIESH

3 P1.5 Temperature 50 deg. PACHAS

4 P1.6 Go AAGE

5 P1.7 Reverse PICHE

6 P1.6, P1.7 Break RUKO

7 P2.2 Fan low set DHEERE

(3)

each pair of channels is equipped with an enable input. A separate supply input is provided for the logic, allowing operation at a lower voltage and internal clamp diodes are included. This device is suitable for use in switching applications at frequencies up to 5 kHz. The L293D is assembled in a 16 lead plastic package which has 4 center pins connected together and used for heat sinking. The L293DD is assembled in a 20 lead surface mount which has 8 center pins connected together and used for heat sinking. 600Milli amperes output current capability per channel, 1.2A peak output current per channel, enable facility, over temperature protection, logical input voltage up to 1.5 V, internal clamp diodes.

Fig. 4: Functional block diagram of A to D converter.

d) Building a wireless remote control:

Now question arises that how you can get rid of that long wired tail dangling out of your remote control robot? Well, transforming your wired remote control into a wireless one isn‟t as difficult as you may think. The easiest solution would be to hack those cheap wireless toy cars, take their electronic guts out and use them in your robot. But if you want more flexibility, you can build a custom remote control system. The idea is to use off the shelf RF Tx/Rx modules. These modules, once a rare commodity, are now widely and cheaply available. In this particular discussion, we shall be using ASK (Amplitude Shift Keying) based TX/RX pair operating at 433 MHz

Fig. 5: ASK Transmitter and Receiver.

The transmitter module accepts serial data at a maximum of XX baud rate. They can be directly interfaced to a microcontroller or can be used in remote control applications with the help of encoder/decoder ICs. The encoder IC takes in parallel data at the TX side packages it into serial format and then transmits it with the help of a RF transmitter module. At the RX end, the decoder IC receives the signal via the RF receiver module, decodes the serial data and reproduces the original data in the parallel format. Now in order to control say one motor, we require 2 bits of information while we need 4 bits of information to control 2 motors. HT12E and HT12D is 4 channel encoder/decoder ICs directly compatible with the specified RF module.

e) Wheel chair control:

Receiver receives the data in serial form then it decodes that data and at last it is again converted into parallel form and given to the receiver side CPU. At the receiver side the decoder circuit IC HT 12D is used as a decoder. At the decoder again the codes are received in serial form which then again converted into parallel form. These decoded signals are then given as an input to CPU. At the receiver side the IC MN4519 is used as the buffer.

-CNTRL=0 R30 6K8 VCC VCC +12V U10A NAND2 1 23 VREF +12V 2K7 VREF -2K7 1 2 NOT 1 2 + U13 NOT 1 2 +12V VREF NOT 1 2 + -U19C LM339 9 8 14 3 12 PULSE R31 2K7 TIP-127 TIP-122 +5V 2K7 DIR/1 VCC TIP-127 PNP DC--MOTOR VCC Q8 NPN 7404 VCC + -U20B LM339 5 4 2 3 12 2K7 + -U18A LM339 7 6 1 3 12 +12V +12V U14 NOT 1 2 2K7

DC MOTOR CONTROL CARD 2K7 DIR/2 NOT 1 2 + -U21D LM339 11 10 13 3 12 Q11 PNP 7404 PAD4 OCPAD Q9 NPN 12V 7404 7400 U11B NAND1 1 23 2K7 2K7 7404 VREF VREF + TIP-122 7400

DIR 1 DIR 2

CONTROL -+ -+ + -+ A B NAND NOT COMP NAND NOT COMP. NOT COMP. 1 2 3 adc 4 5 6 7 8 3

Fig. 6: Robot Control.

(4)

IV. SOFTWARE DESIGN

Keyword recognition algorithm is designed according to the block diagram as shown in figure below.

Fig. 7: Block diagram of Mel Frequency Cepstral Coefficient Speech recognition algorithm is written in matlab 7.0 and results are tested in clean or noisy test data. The explanation and results are discussed in main program step by step as shown below:

Step1. Declare variables:

clear all; % clear all variables close all; % close all files clc % clear screen

ncoeff = 13; %Required number of mfcc coefficients N = 8; %Number of words in vocabulary

k = 4; %Number of nearest neighbors to choose fs=16000; %Sampling rate

duration1 = 0.15; %Initial silence duration in seconds duration2 = 2; %Recording duration in seconds

G=2; %vary this factor to compensate for amplitude variations

NSpeakers = 5; %Number of training speakers

Step2. Input Keyword and perform EPD:

Fig. 8: End Point Detection for Hindi Key word “AAGE”.

for i=1:8; % Check real time 8 keywords

fprintf('Press any key to start %g seconds of speech recording...', duration2);

pause; % Wait for 0.15 second

silence = wavrecord(duration1*fs, fs); %Record keyword fprintf('Recording speech...');

speechIn = wavrecord(duration2*fs, fs); % duration*fs is the total number of sample points

Fig. 9: After End Point Detection for Hindi Key word “AAGE”.

Step3. Addition of silence:

p=length(speechIn)-length(silence); for i=1:p

silence=[silence ;0]; end

fprintf('Finished recording.\n');

fprintf('System is trying to recognize what you have spoken...\n');

speechIn1 = [silence;speechIn]; %pads with 150 ms silence speechIn2 = speechIn1.*G;

Fig. 10: Addition of silence 0.15 seconds in Hindi key word “AAGE”.

Step4. Noise Reduction:

speechIn3 = speechIn2 - mean(speechIn2); %DC offset elimination

speechIn = nreduce(speechIn3,fs); %Applies spectral subtraction

Fig. 11: After noise reduction for Hindi key word “AAGE”.

Step5. Windowing, DFT and Mel filter bank:

rMatrix1 = mfccf(13,speechIn,fs); %Compute test feature vector

Fig. 12: Shows the time signal of the Hindi key word AAGE and Mel filter bank of the word computed via FFT.

Step6. Inverse DFT:

rMatrix = CMN(rMatrix1); %Removes convolutional noise Sco = DTWScores(rMatrix,N); %computes all DTW scores [SortedScores,EIndex] = sort(Sco); %Sort scores increasing K_Vector = EIndex(1:k); %Gets k lowest scores

(5)

Fig.13: DCT and Spectrogram for „AAGE‟ Key Word.

% Code below uses the index of the returned k lowest scores to determine their classes

for t = 1:k

u = K_Vector(t); for r = 1:NSpeakers-1 if u <= (N) break else u = u - (N); end

end

Neighbors(t) = u; end

Fig.14: Result for keyword recognition „AAGE‟ Key Word.

%Apply k-Nearest Neighbor rule

Nbr = Neighbors[Modal,Freq] = mode(Nbr); %most frequent value

Word = strvcat('Forward-AAGE', 'Reverse-PICHE', 'Break-RUKO', 'Thirty-TEESH', 'Fifty-PACHAS', 'low-DHERE', 'Medium-TEJ', 'Stop-BAND');

if mean(abs(speechIn)) < 0.01

fprintf('No microphone connected or you have not said anything.\n');

elseif ((k/(Freq)) > 2) %if no majority

fprintf('The word you have said could not be properly recognised.\n');

else

fprintf('You have just said %s.\n',Word(Modal,:)); %Prints recognized word

end

V. RESULT DISCUSSION

We made two experiments, in noise and in clean environment one using traditional method (Md. Rashidul Hasan et al. 2004) and the other using the developed technique. The templets were used as input to the same recognition system using DTW in order to measure the performance for each method. First experiment uses the

Fig. 15: Shows results in chart for noise environment with or without EPD.

(6)

Fig. 16: Shows results in chart for clean environment with or without EPD.

Chart shows approximately 97.50 % accuracy with end point detection when user 1 say key Word in 10 × 12 room with clean environment (Fan Off, Tv Off, No Cooking in Kitchen) and without end point detection average accuracy is 87.50 %. Figure 2 shows chart for hindi key word recognition in clean environment with or without EPD. After calculating MFCC features, DTW finds nearest distance between spoken word and recorded samples of 10 speakers. If nearest distance of recorded samples matches with five or more samples then it will show output and related to key word operation performed, if match is below five samples then play recording word not recognized please try again.

VI. CONCLUSION AND FUTURE WORK

This paper presents a simple technique for word detection using end point detection, feature extraction using Mel frequency cepstral coefficient and feature matching using dynamic time warping. The implemented algorithm and control system control fan speed, temperature of heater and robot direction using the voice key word. It demonstrates its reliability and ease of future development. Based on obtained experimental results it demonstrates that the proposed algorithm is indeed functional and it can be used in voice key word recognition home automation system and industrial robots. Percentage of correct recognition of key word is high enough. The recognition results are tested for clean and noisy test data. The system can be said to be robust as average accuracy for clean data is 97.50 while that for noisy data is 91.25 %.

The main contribution of this study is that it presents the idea of Hindi key word recognition and Home Automation system. The experiments also show that the approach is good for Hindi key word recognition. The proposed ASR and Control System was completely implemented, our effort will be directed toward developing the more appropriate and convenient method.

REFERENCES

[1] A. Rathinavelu, G.Anupriya, A.S.Muthanantha murugavel,

“Speech Recognition Model for Tamil Stops”, Proceedings of

the World Congress on Engineering, ISBN:978-988-98671-5-7, Vol I, pp. 543 – 547, July 2 - 4, 2007.

[2] Adriana. Tapus and Brian Scassellati, “The grand challenges in

helping humans through social robotics”, IEEE Robotics &

Automation Magazine, Vol 14, Issue 1, pp. 35–42, 2007.

[3] Anjli Bala, Abhijeet Kumar and Nidhika Birla, “Voice

Command Recognition System Based on MFCC and DTW”,

International Journal of Engineering Science and Technology, ISSN: 0975-5462, Vol. 2, No 12, pp. 7335-7342, Dec. 2010.

[4] Atanas Ouzounov (2010) “Acestral Feature and Text

Dependent Speaker Identification-A Comparative stdy”,

Cybernetics and Information Technologies, Vol. 10, No. 1, pp. 1-12, 2010.

[5] B. H. Juang and Lawrence R. Rabiner, “Automatic Speech

Recognition – A Brief History of the Technology”,Vol. 10, No.

3, August 2004

[6] Bengt J. Borgstrom, “HMM-Based Reconstruction of Unreliable Spectrographic Data for Noise Robust Speech

Recognition”, IEEE Transactions on Audio and Language

Processing, Vol. 18, No. 6, pp. 1612-1623 August 2010.

[7] Bharti W. Gawali, Santosh Gaikwad, Pravin Yannawar, Suresh C.Mehrotra, “Marathi Isolated Word Recognition System using

MFCC and DTW features”, ACEEE Int. J. on Information

Technology, Vol. 01, No. 01, Mar 2011.

[8] Cini Kurian and Kannan Balakrishnan, “Automated

Transcription System for Malayam Language”, International

Journal of Computer Application, Vol. 19, No. 5, April 2011.

[9] F. K. Soong, A. E. Rosenberg, L. R. Rabiner and B. H. Juang,

“A Vector Quantization Approach to Speaker Recognition”,

Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '85, vol 10, No 3, pp. 387-390, 1985.

[10] Fausto “Tito” Poz and Durand R. Begault, “Voice Identification

and Elimination Using Aural Spectographic Protocol”, AES

26th International Conference, Denver, Colorado, USA, 7–9 July 2005.

[11] Josef Rajnoha et al. (2011) “ASR systems in Noisy Environment: Analysis and Solutions for Increasing Noise

Robustness”, Radioengineering, Vol. 20, No. 1, April 2011.

[12] K. H. Davis, R. Biddulph and S. Balashek, “Automatic

Recognition of spoken digits”, The Journal of the acoustical

society of america, vol 24, No 6, November, 1952.

[13] K. M. Ravikumar, R. Rajagopal and H. C. Nagaraj, “An Approach for Objective Assessment of Stuttered Speech Using

MFCC Features”, DSP Journal, Volume 9, Issue 1, June, 2009.

[14] Khalid Saeed, “Sound and Voice Verification and Identification

A Brief Review of Töeplitz Approach”, Znalosti 2008, pp.

22-27, 2008.

[15] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, “Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW)

Techniques”, Journal of Computing, ISSN 2151-9617, Volume

2, Issue 3, March 2010

[16] M. A. Anusuya and S. K. Katti, “Speech Recognition by

Machine: A Review”, (IJCSIS) International Journal of

Computer Science and Information Security, Vol. 6, No. 3, 2009.

[17] Maayan Geffet, Yair Wiseman and Dror Feitelson, “Automatic

Alphabet Recognition”, Springer Science, Vol. 8, pp. 25–40,

2005.

[18] Mark D. Skowronski and John G. Harris, “Improving the Filter

(7)

Intl Symposium on Circuits and Systems, Bangkok, Thailand, vol 4, pp. 281-284, May 25 - 28, 2003.

AUTHORS

First Author– Abhishek Thakur: M.

Tech. in Electronics and Communication Engineering from Punjab Technical University, MBA in Information Technology from Symbiosis Pune, M.H. Bachelor in Engineering (B.E.- Electronics) from Shivaji University Kolhapur, M.H. Five years of work experience in teaching and one year of work experience in industry. Area of interest: Digital Image and Speech Processing, Antenna Design and Wireless Communication. International Publication: 7, National Conferences and Publication: 6, Book Published: 4 (Microprocessor and Assembly Language Programming, Microprocessor and Microcontroller, Digital Communication and Wireless Communication). Working with Indo Global College of Engineering Abhipur, Mohali, P.B. since 2011.

Email: [email protected]

Second Author – Rajesh Kumar is

working as Associate Professor at Indo Global College of Engineering, Mohali, Punjab. He is pursuing Ph.D from NIT, Hamirpur, H.P. and has completed his M.Tech from GNE, Ludhiana, India. He completed his B.Tech from HCTM, Kaithal, India. He has 11 years of academic experience. He has authored many research papers in reputed International Journals, International and National conferences.

His areas of interest are VLSI, Microelectronics and Image & Speech Processing.

Third Author – Amandeep Batth:

M. Tech. in Electronics and Communication Engineering from Punjab Technical University, MBA in Human Resource Management from Punjab Technical University , Bachelor in Technology (B-Tech.) from Punjab Technical University . Six years of work experience in teaching. Area of interest: Antenna Design and Wireless Communication. International Publication: 1, National Conferences and Publication: 4. Working with Indo Global College of Engineering Abhipur, Mohali, P.B. since 2008. Email: [email protected]

Fourth Author – Jitender Sharma: M. Tech. in Electronics

and Communication Engineering from Mullana University, Ambala, Bachelor in Technology (B-Tech.)from Punjab Technical University . Five years of work experience in teaching. Area of interest:, Antenna Design and Wireless Communication. International Publication: 1 National Conferences and Publication:6 and Wireless Communication). Working with Indo Global college since 2008.