Automatic speech recognition

Top PDF Automatic speech recognition:

Speech Analysis for Automatic Speech Recognition

Speech Analysis for Automatic Speech Recognition

This Master Thesis was developed at the Department of Electronics and Telecommunications (Faculty of Information Technology, Mathematics and Electrical Engineering) at NTNU University (Trondheim, Norway), from February 2009 to July 2009. The Master Thesis was called Speech Analysis for Automatic Speech Recognition. This Master Thesis is connected to the research project SIRKUS 1 . The aims of SIRKUS project is to investigate structures and strategies for automatic speech recognition; both in terms of what type of linguistic units it uses as the basic unit (today, phonemes, which are perceptually defined, are used), which acoustic properties to look for in the speech waveform, or which classifier to use (Hidden Markov Models (HMM) are predominantly used today).
Show more

91 Read more

Automatic Speech Recognition: A Review

Automatic Speech Recognition: A Review

phonetics, lexical access, syntax, semantics and pragmatics. 4 PHASES OF ASR Automatic speech recognition system involves two phases: Training phase and recognition phase. A rigorous training procedure is followed to map the basic speech unit such as phone, syllable to the acoustic observation. In training phase, known speech is recorded, pre-processed and then enters the first stage i.e. Feature extraction. The next three stages are HMM creation, HMM training and HMM storage. The recognition phase starts with the acoustic analysis of unknown speech signal. The signal captured is converted to a series of acoustic feature vectors. Using appropriate algorithm, the input observations are processed. The speech is compared against the HMM‟s networks and the word which is pronounced is displayed. An ASR system can only recognize what it has learned during the training process. But, the system is able to recognize even those words, which are not present in the training corpus and for which sub-word units of the new word are known to the system and the new word exists in the system dictionary.
Show more

11 Read more

A Review on Automatic Speech Recognition

A Review on Automatic Speech Recognition

With the advancement of speech recognition technologies, there is an increase in the adoption of voice interfaces on mobile-based platforms. While, developing a general purpose Automatic Speech Recognition (ASR) which can understand voice commands is important, the contexts of how people interact with their mobile device change very rapidly. Due to the high processing complexity of the ASR engine, much of the processing of trending data is being carried out on cloud platforms. Changed content regarding news, music, movies and TV series change the focus of interaction with voice based interfaces. Hence ASR engines trained on a static vocabulary may not be able to adapt to the changing contexts. The focus of this paper is to first describe the problems faced in incorporating dynamically changing vocabulary and contexts into an ASR engine. We then propose a novel solution which shows a relative improvement of 38 percent utterance accuracy on newly added content without compromising on the overall accuracy and stability of the system.
Show more

7 Read more

Automatic Speech Recognition: A Review

Automatic Speech Recognition: A Review

Automatic Speech Recognition (ASR) is the process of converting a speech signal to a sequence of words, by means of an algorithm implemented as a computer program (Sanjivani S. Bhabad and Gajanan K. Kharate, 2013). Due to technological curiosity to build machines that mimic humans or desire to automate work with machines, research in speech recognition, as a first step toward natural human-machine communication, has attracted much enthusiasm over the past five decades. Therefore several research efforts have been oriented to this area where computer scientists have been researching ways and means to make computers able to record, interpret and understand human speech. It has been an intensive research area for decades. ASR system includes two phases. Training phase and Recognition phase. In training phase, known speech is recorded, and then the features (parametric representation of the speech) are extracted and stored in the speech database. In the recognition phase, the features of the given input speech signal are extracted and compared with the reference templates (stored in the speech database) to recognize the utterance.
Show more

6 Read more

Evaluating Automatic Speech Recognition in Translation

Evaluating Automatic Speech Recognition in Translation

The MITRE Corporation, 7525 Colshire Dr, McLean, VA 22102 Abstract We address and evaluate the challenges of utilizing Automatic Speech Recognition (ASR) to support the human translator. Audio transcription and translation are known to be far more time-consuming than text translation; at least 2 to 3 times longer. Furthermore, time to trans- late or transcribe audio is vastly dependent on audio quality, which can be impaired by back- ground noise, overlapping voices, and other acoustic conditions. The purpose of this paper is to explore the integration of ASR in the translation workflow and evaluate the challenges of utilizing ASR to support the human translator. We present several case studies in different settings in order to evaluate the benefits of ASR. Time is the primary factor in this evaluation.
Show more

9 Read more

Literature Review on Automatic Speech Recognition

Literature Review on Automatic Speech Recognition

Automatic speech recognition, which was considered to be a concept of science fiction and which has been hit by number of performance degrading factors, is now an important part of information and communication technology. Improvements in the fundamental approaches and development of new approaches by researchers have lead to the advancement of ASRs which were just responding to a set of sounds to sophisticated ASRs which responds to fluently spoken natural language. Using artificial neural networks (ANNs), mathematical models of the low-level circuits in the human brain, to improve speech-recognition performance, through a model known as the ANN-Hidden Markov Model (ANN- HMM) have shown promise for large-vocabulary speech recognition systems. Achieving higher Recognition accuracy, low Word error rate, developing speech corpus depending upon the nature of language and addressing the issues of sources of variability through approaches like Missing Data Techniques & Convolutive Non-Negative Matrix Factorization, are the major considerations for developing an efficient ASR. In this paper, an effort has been made to highlight the progress made so far for ASRs of different languages and the technological perspective of automatic speech recognition in countries like China, Russian, Portuguese, Spain, Saudi Arab, Vietnam, Japan, UK, Sri- Lanka, Philippines, Algeria and India.
Show more

9 Read more

Quality Estimation for Automatic Speech Recognition

Quality Estimation for Automatic Speech Recognition

{negri,turchi,desouza,falavi}@fbk.eu Abstract We address the problem of estimating the quality of Automatic Speech Recognition (ASR) out- put at utterance level, without recourse to manual reference transcriptions and when information about system’s confidence is not accessible. Given a source signal and its automatic transcription, we approach this problem as a regression task where the word error rate of the transcribed utter- ance has to be predicted. To this aim, we explore the contribution of different feature sets and the potential of different algorithms in testing conditions of increasing complexity. Results show that our automatic quality estimates closely approximate the word error rate scores calculated over reference transcripts, outperforming a strong baseline in all the testing conditions.
Show more

11 Read more

The WaveSurfer Automatic Speech Recognition Plugin

The WaveSurfer Automatic Speech Recognition Plugin

Keywords: Automatic Speech Recognition, Free Software, WaveSurfer 1. Introduction Automatic Speech Recognition (ASR) is becoming an im- portant part of our lives, both as a viable alternative for humans-computer interaction, but also as a tool for linguis- tics and speech research. In many cases, however, it is trou- blesome, even in the language and speech communities, to have easy access to ASR resources. On the one hand, com- mercial systems are often too expensive and not flexible enough for researchers. On the other hand, free ASR soft- ware often lacks high quality resources such as acoustic and language models for the specific languages and requires ex- pertise that linguists and speech researchers cannot afford.
Show more

5 Read more

Morphosyntactic Resources for Automatic Speech Recognition

Morphosyntactic Resources for Automatic Speech Recognition

{shuet, ggravier, sebillot}@irisa.fr Abstract Texts generated by automatic speech recognition (ASR) systems have some specificities, related to the idiosyncrasies of oral productions or the principles of ASR systems, that make them more difficult to exploit than more conventional natural language written texts. This paper aims at studying the interest of morphosyntactic information as a useful resource for ASR. We show the ability of automatic methods to tag outputs of ASR systems, by obtaining a tag accuracy similar for automatic transcriptions to the 95-98 % usually reported for written texts, such as newspapers. We also demonstrate experimentally that tagging is useful to improve the quality of transcriptions by using morphosyntactic information in a post-processing stage of speech decoding. Indeed, we obtain a significant decrease of the word error rate with experiments done on French broadcast news from the ESTER corpus; we also notice an improvement of the sentence error rate and observe that a significant number of agreement errors are corrected.
Show more

7 Read more

A survey on automatic speech recognition system

A survey on automatic speech recognition system

Speech is a primary mode of communication among human beings. It is natural for people to expect to be able to carry out spoken dialogue with computers. In this paper we discussed the fundamental approach and development of speech recognition in the last several year of research in Automatic Speech Recognition (ASR). The design of Speech Recognition system requires careful attentions to the following issues: Various type of speech class, Feature Extraction, Acoustic model, Pronunciation Dictionary and language model. We presented the various techniques to solve this problem existing in ASR. This paper is helpful for to review the problem in ASR research in various Speech recognition models.
Show more

11 Read more

Danish Stød and Automatic Speech Recognition

Danish Stød and Automatic Speech Recognition

There are two scenarios in medical dictation where ASR can remove or alleviate the problems mentioned above: Real-time ASR and ASR+post-editing. Real-time automatic speech recognition Speaking is faster than typing (Basapur et al., 2007). If the physician uses digital dictation augmented with real-time ASR, the secretary is not a part of the documentation workflow and a resource is free for other purposes. As a side-effect, the physician is the last eyes on the transcription and can approve or correct a transcription immediately while the consultation is still fresh in memory. If integrated with an electronic medical records system, the physician can even dictate directly into the patient record and the clinical documentation will always be up-to-date with the most recent information.
Show more

247 Read more

A Review On Different Feature Recognition Techniques For Speech Process In Automatic Speech Recognition.

A Review On Different Feature Recognition Techniques For Speech Process In Automatic Speech Recognition.

KEYWORDS- Automatic Speech Recognition (ASR) , Feature Extraction, MFCCs, LPC, RASTA, PLDA and PLP. I. INTRODUCTION Speech is the important way of communication. Speech processing is one of the most rousing research areas under signal processing. The signals are generally processed in digital domain; hence speech processing can also be distinctively called as digital signal processing appertained to speech signal. Automatic Speech Recognition (ASR) is a computer speech recognition system. It is a course of action of converting speech signal into series of words and other lingual units with help of algorithms which could be implemented as computer programs. The predominant objective of ASR is to develop different techniques and also system to enable the computers to identify speech signals which are fed as input. Speech recognition and its applications have evolved from past few decades. In any of the speech recognition system the speech signal is converted into text form out of which, the text form will be the output from ASR and this text will be almost equivalent to the speech fed as input. This recognition has its procesecution in voice search, voice dialling, robotics etc.
Show more

5 Read more

Automatic Speech Recognition: A Shifted Role in Early Speech Intervention?

Automatic Speech Recognition: A Shifted Role in Early Speech Intervention?

{fhamidi,mb}@cse.yokru.ca Abstract Although automatic speech recognition (ASR) has been used in several systems that support speech training for children, this particular design domain poses on-going challenges: an input domain of non-standard speech and a user population for which meaningful, consistent, and well designed automatically-derived feedback is imperative. In this design analysis, we focus on and analyze the differences between the tasks of speech recognition and speech assessment, and identify the latter as a central issue for work in the speech-training domain. Our analysis is based on empirical results from fieldwork with Speech-Language Pathologists concerning the design requirements analysis for tangible toys intended for speech intervention with primary- school aged children. This analysis leads us to advocate for the use of only rudimentary ASR feedback.
Show more

7 Read more

Automatic speech recognition with deep neural networks for impaired speech

Automatic speech recognition with deep neural networks for impaired speech

2 Universit¨ at des Saarlandes, Saarbr¨ ucken, Germany cristinae@cs.upc.edu, jose.fonollosa@upc.edu Abstract. Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of im- paired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models out- perform classical GMM-HMM according to word error rate measures. A DNN is able to improve the recognition word error rate a 13% for sub- jects with dysarthria with respect to the best classical architecture. This improvement is higher than the one given by other deep neural networks such as CNNs, TDNNs and LSTMs. All the experiments have been done with the Kaldi toolkit for speech recognition for which we have adapted several recipes to deal with dysarthric speech and work on the TORGO database. These recipes are publicly available.
Show more

11 Read more

Speech analysis for alphabets in Bangla language: automatic speech recognition

Speech analysis for alphabets in Bangla language: automatic speech recognition

Bangla (can also be termed as Bengali), which is largely spoken by the people all over the world, has been performed a very little research where many literatures in automatic speech recognition (ASR) systems are available for almost all the major spoken languages in the world. Although Bangla speakers’ number is about 250 million today, which makes Bangla the seventh language ( banglapedia, 2013 ), a systematic and scientific effort for the computerization of this language has not been started yet. The Bengali alphabet is a syllabic alphabet in which consonants all have an inherent vowel which has two different pronunciations, the choice of which is not always easy to determine and which is sometimes not pronounced at all. Some efforts are made to develop Bangla speech corpus to build a Bangla text to speech system (Hossain et al., 2007). However, this effort is a part of developing speech databases for Indian Languages, where Bangla is one of the parts and is spoken in the eastern area of India (West Bengal). But most of the natives of Bangla (more than two thirds) reside in Bangladesh, where it is the official language. Although the written characters of standard Bangla in both the countries are same, there are some sounds
Show more

8 Read more

Speech Analysis for Alphabets in Bangla Language:  Automatic Speech Recognition

Speech Analysis for Alphabets in Bangla Language: Automatic Speech Recognition

Bangla (can also be termed as Bengali), which is largely spoken by the people all over the world, has been performed a very little research where many literatures in automatic speech recognition (ASR) systems are available for almost all the major spoken languages in the world. Although Bangla speakers’ number is about 250 million today, which makes Bangla the seventh language ( banglapedia, 2013 ), a systematic and scientific effort for the computerization of this language has not been started yet. The Bengali alphabet is a syllabic alphabet in which consonants all have an inherent vowel which has two different pronunciations, the choice of which is not always easy to determine and which is sometimes not pronounced at all. Some efforts are made to develop Bangla speech corpus to build a Bangla text to speech system (Hossain et al., 2007). However, this effort is a part of developing speech databases for Indian Languages, where Bangla is one of the parts and is spoken in the eastern area of India (West Bengal). But most of the natives of Bangla (more than two thirds) reside in Bangladesh, where it is the official language. Although the written characters of standard Bangla in both the countries are same, there are some sounds
Show more

6 Read more

Novel speech processing techniques for robust automatic speech recognition

Novel speech processing techniques for robust automatic speech recognition

The goal of this thesis is to develop and design new feature representations that can improve the automatic speech recognition (ASR) performance in clean as well noisy conditions. One of the main shortcomings of the fixed scale (typically 20-30 ms long analysis windows) envelope based feature such as MFCC, is their poor handling of the non-stationarity of the underlying signal. In this thesis, a novel stationarity-synchronous speech spectral analysis technique has been proposed that sequentially detects the largest quasi-stationary segments in the speech signal (typically of variable lengths varying from 20-60 ms), followed by their spectral analysis. In contrast to a fixed scale anal- ysis technique, the proposed technique provides better time and frequency resolution, thus leading to improved ASR performance. Moving a step forward, this thesis then outlines the development of theoretically consistent amplitude modulation and frequency modulation (AM-FM) techniques for a broad band signal such as speech. AM-FM signals have been well defined and studied in the context of communications systems. Borrowing upon these ideas, several researchers have applied AM-FM modeling for speech signals with mixed results. These techniques have varied in their definition and consequently the demodulation methods used therein. In this thesis, we carefully define AM and FM signals in the context of ASR. We show that for a theoretically meaningful esti- mation of the AM signals, it is important to constrain the companion FM signal to be narrow-band. Due to the Hilbert relationships, the AM signal induces a component in the FM signal which is fully determinable from the AM signal and hence forms the redundant information. We present a novel homomorphic filtering technique to extract the leftover FM signal after suppressing the redundant part of the FM signal. The estimated AM message signals are then down-sampled and their lower DCT coefficients are retained as speech features. We show that this representation is, in fact, the exact dual of the real cepstrum and hence, is referred to as fepstrum. While Fepstrum provides amplitude modulations (AM) occurring within a single frame size of 100ms, the MFCC fea- ture provides static energy in the Mel-bands of each frame and its variation across several frames (the deltas). Together these two features complement each other and the ASR experiments (hidden Markov model and Gaussian mixture model (HMM-GMM) based) indicate that Fepstrum feature in conjunction with MFCC feature achieve significant ASR improvement when evaluated over several speech databases.
Show more

118 Read more

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition

This paper introduces a speech corpus which is developed for Myanmar Au- tomatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in de- veloping the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low- resourced language because of lack of pre-created resources for speech pro- cessing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily con- versations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data. The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three different types of acoustic models such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on dif- ferent data sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisfiable results on both test sets. The Myanmar ASR using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2.
Show more

9 Read more

An enhanced automatic speech recognition system for Arabic

An enhanced automatic speech recognition system for Arabic

Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely. 2011. The kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Sig- nal Processing Society, December. IEEE Catalog No.: CFP11SRW-USB.

9 Read more

Contextual Error Correction in Automatic Speech Recognition

Contextual Error Correction in Automatic Speech Recognition

Contextual Error Correction in Automatic Speech Recognition ABSTRACT This disclosure describes techniques that leverage the context of a conversation between a user and a virtual assistant to correct errors in automatic speech recognition (ASR). Once confirmed by the user, the correction event is used to augment the training data for ASR.

6 Read more

Show all 10000 documents...