Acronyms - Sub-word based language modeling of morphologically rich languages for LVCSR

B.2 Acronyms

AENN Auto-Encoder Neural Network ASR Automatic Speech Recognition

BAMA Buckwalter Arabic Morphological Analyzer

BC Broadcast Conversation

BIC Bayesian Information Criterion

bMMI boosted Maximum Mutual Information

BN Broadcast News

CD Contrastive Divergence

CER Character Error Rate

CLM Class-based Language Model

CMLLR Constrained Maximum Likelihood Linear Regression

CN Confusion Network

CRP Chinese Restaurant Process

DARPA Defense Advanced Research Projects Agency

DAT Dialog Act Tagging

DMC Discriminative Model Combination

DNN Deep Neural Network

DNNLM Deep Neural Network Language Model

DP Dynamic Programming

DT Discriminative Training

EA Evolutionary Algorithms

ECA Egyptian Colloquial Arabic

EM Expectation Maximization

EPPS European Parliament Plenary Sessions

FFT Fast Fourier Transform

FLM Factored Language Model

fMLLR f eature space Maximum Likelihood Linear Regression G2P Grapheme-to-Phoneme Conversion

GA Genetic Algorithm

GD Graph Density

GER Graph Error Rate

GMLM Gaussian Mixture Language Model

GMM Gaussian Mixture Model

GPB Generalized Parallel Backoff

GT Gammatone filter

HCRP Hierarchical Chinese Restaurant Process HLDA Heteroscedastic Linear Discriminant Analysis

HMM Hidden Markov Model

HPY Hierarchical Pitman-Yor

HPYCLM Hierarchical Pitman-Yor Class-based Language Model HPYLM Hierarchical Pitman-Yor Language Model

IBM International Business Machines Corporation

Appendix B Symbols and Acronyms

IKN Interpolated Kneser-Ney Smoothing

KN Kneser-Ney Smoothing

LDA Linear Discriminant Analysis

LM Language Model

LSA Latent Semantic Analysis

LSTM Long Short-Term Memory Neural Network LSTMLM Long Short-Term Memory Language Model LVCSR Large Vocabulary Continuous Speech Recognition

MADA Morphological Analyzer and Disambiguator tool for Arabic

MAP Maximum A-posterior

MDL Minimum Description Length

MFCC Mel-Frequency Cepstral Coefficients

MKN Modified Kneser-Ney

ML Maximum Likelihood

MLLR Maximum Likelihood Linear Regression MLP Multilayer Perceptron Neural Network

MPE Minimum Phone Error

MSA Modern Standard Arabic

MSE Mean Square Error

MT Machine Translation

NER N-best Error Rate

NNLM Neural Network Language Model

OOV Out-Of-Vocabulary

PER Phoneme Error Rate

PLP Perceptual Linear Predictive features

POI Probability Of Improvement

POS Part-Of-Speech

PPL Perplexity

PY Pitman-Yor

RBM Restricted Boltzmann Machines

RNN Recurrent Neural Network

RNNLM Recurrent Neural Network Language Model RWTH Rheinisch Westf¨alische Technische Hochschule

SAT Speaker Adaptive Training

SNN Shallow Neural Network

SNNLM Shallow Neural Network Language Model SRILM SRI Language Modeling Toolkit

STC Semi-Tied Covariance

SVD Singular Value Decomposition

TC Telephone Conversations

TDP Time Distortion Penalty

TMLM Tied-Mixture Language Model

TMLM-CO Tied-Mixture Language Model with bigram CO-occurrence based features TMLM-NN Tied-Mixture Language Model with Neural Network based features

B.2 Acronyms

VTLN Vocal Tract Length Normalization

WER Word Error Rate

WFST Weighted Finite State Transducer

WSJ Wall Street Journal

List of Figures

1.1 Basic architecture of a statistical automatic speech recognition system according to [Ney 1990]. . . 3 1.2 6-state hidden Markov model in Bakis topology for the triphone sehv in the word “seven”

and the resulting trellis for a time alignment. The HMM segments are denoted by <1>,

<2>, and <3>. . . 6 1.3 An example of a word lattice (taken from [Schwenk 2007]). The lattice is produced using

a trigram LM, where each word has a unique bigram context. For simplicity, acoustic and language model scores are not shown on arcs ([fw]: filler word; [breath]: breath noise). . . 11 1.4 An example of a confusion network (CN) derived from a lattice. The figure shows: the

original lattice, a derived CN, and an intermediate lattice in which all paths have the same length. The positions for the insertions of the -arcs are derived from the CN according to the algorithm described in [Hoffmeister 2011]. The number that appears on each arc corresponds to the CN slot to which the arc is assigned. . . 12 3.1 Optimization of the number of full-words retained in the sub-word based vocabularies. . . . 42 3.2 Optimization of the overall vocabulary sizes for full-word and sub-word based experiments. 43 3.3 The best sub-word based experiments compared to the best full-word based experiments

on Arabic, German, and Polish corpora. . . 44 4.1 (a) An example of a general backoff graph showing all possible backoff paths from top to

bottom. (b) An example of a backoff graph where only a subset of the possible backoff paths are allowed. . . 57 4.2 Topologies of the Arabic FLMs using the format specifications of the SRILM-FLM

exten-sions (W: word; M: morph; L: lexeme; P: pattern). . . 62 4.3 Backoff graphs for AR−F LM₁_∶5, detailed topologies are given in Figure 4.2 (W: word; M:

morph; L: lexeme; P: pattern). . . 63 4.4 Topologies of the German FLMs using the format specifications of the SRILM-FLM

exten-sions (W: word; L: lexeme; I: class-index; P: POS-tag). . . 65 4.5 Backoff graphs for GR−F LM₁_∶7, detailed topologies are given in Figure 4.4 (W: word; L:

lexeme; I: class-index; P: POS-tag). . . 66 4.6 Comparison of recognition WERs [%] on Arabic and German corpora using different LMs. 73 4.7 Interpolation weights of individual Arabic morpheme-based LMs, models with negligible

weights are not shown in the figure. . . 74 4.8 Interpolation weights of individual German morpheme-based LMs, models with negligible

weights are not shown in the figure. . . 75 5.1 Architecture of a shallow NNLM (SNNLM) that estimates the model p(w_n∣w_nⁿ⁻¹_−m+1). . . 81 5.2 Architecture of a deep NNLM (DNNLM) that estimates the model p(w_n∣w_nⁿ⁻¹_−m+1). . . 82 5.3 Architecture of a deep NNLM (DNNLM) with input classes. The input encoding uses

sep-arate vectors for words and their classes for every history position. The network estimates the model p(w_n∣w_n₋₁c_n₋₁w_n₋₂c_n₋₂). . . 83 5.4 Architecture of a deep NNLM (DNNLM) with input classes. The input encoding uses

one combined vector for each word and its class for every history position. The network estimates the model p(wn∣wn−1cn−1wn−2cn−2). . . 84 5.5 General steps of a greedy layer-wise unsupervised pre-training algorithm. . . 88

List of Figures

5.6 Optimization of the number of decomposable full-words retained in the morpheme-based vocabulary performed over eca-dev corpus using overall vocabulary size of 250k (best WER

= 56.8% with 5k full-words). Baseline WER on eca-dev using 350k full-words vocabulary

= 56.9%. . . 93 5.7 Comparison of recognition WERs [%] on Egyptian Arabic eca-eval corpus using different

LMs. . . 94 5.8 Interpolation weights of individual morpheme-based LMs. . . 95

List of Tables

1.1 Different Arabic words derived from the same root “ktb”. . . 15

3.1 Arabic solar and lunar consonants (bw: using Buckwalter transliteration; ar: using Arabic script). . . 31

3.2 An example of the alignment process for the word-pronunciation pair (phase,feIz). . . 35

3.3 Recognition experiments on Arabic corpora using morpheme-based LMs with 70k vocabu-laries. . . 35

3.4 Recognition experiments on Arabic corpora using full-words, morphemes, and diacritized morphemes for LMs with very large vocabularies. . . 36

3.5 word- and character-level perplexities for full-word and sub-word based LMs on Arabic corpora (inv: perplexity for in-vocabulary text excluding the unk symbol; all: perplexity for the whole text including the unk symbol). . . 37

3.6 Recognition experiments on German corpora using morpheme-based LMs with 100k vo-cabularies. . . 37

3.7 Recognition experiments on German corpora using 100k full-words as a baseline vocabulary and adding different fragment-based and morpheme-based graphones. . . 38

3.8 Recognition experiments on German corpora using full-words, morphemes, and morphemic graphones for LMs with very large vocabularies. . . 39

3.9 word- and character-level perplexities for full-word and sub-word based LMs on German corpora (inv: perplexity for in-vocabulary text excluding the unk symbol; all: perplexity for the whole text including the unk symbol). . . 39

3.10 Recognition experiments on Polish corpora using morpheme- and syllable-based LMs with 300k vocabularies. . . 40

3.11 Recognition experiments on Polish corpora using full-words, syllables, and syllabic gra-phones for LMs with very large vocabularies. . . 41

3.12 word- and character-level perplexities for full-word and sub-word based LMs on Polish corpora (inv: perplexity for in-vocabulary text excluding the unk symbol; all: perplexity for the whole text including the unk symbol). . . 41

3.13 Analysis of improvements in the best sub-word based system compared to the best full-word based system for Arabic, German, and Polish corpora. Amount of reduction in WER is divided into (ins: reduction in insertion rate; OOV del/sub: reduction in deletion/-substitution rate of OOV words; INV del/sub: reduction in deletion/deletion/-substitution rate of INV words). Note: a negative reduction means an increase. . . 45

3.14 Examples of words for which recognition is improved using the best sub-word based systems. 45 3.15 List of participants in different evaluation campaigns. . . 46

3.16 Quaero German ASR evaluation 2010. . . 46

3.17 Quaero German ASR evaluation 2011. . . 47

3.18 Quaero German ASR evaluation 2012. . . 47

3.19 Quaero Polish ASR evaluation 2012. . . 47

3.20 Quaero German ASR evaluation 2013. . . 47

3.21 IWSLT German ASR evaluation 2013. . . 48

3.22 OpenHaRT Arabic handwriting recognition evaluation 2013. . . 48

4.1 Recognition experiments on Arabic ar-tune07 corpus using different factored LMs (vocab-ulary: 70k full-words, OOV rate = 3.6%, N-best size = 1000, N-best error rate (NER) = 7.3%; W: word; M: morph; L: lexeme; P: pattern). . . 63

List of Tables

4.2 Perplexities for the German FLMs GR −F LM1∶7 measured on the German gr-dev09 cor-pus. Exact FLM topologies are given in Figures 4.4 and 4.5 (word-based: 100k full-words vocab; morpheme-based: 100k morpheme-based vocab with 5k full-words + 95k mor-phemes; W: word; L: lexeme; I: class-index; P: POS-tag). . . 64 4.3 Recognition WERs [%] on German corpora using different factored LMs (N-best size =

1000; word-based: 100k full-words, OOV rate = [gr-dev09: 5.0%, gr-eval09: 4.8%], N-best error rate (NER) = [gr-dev09: 23.6%, gr-eval09: 21.4%]; morpheme-based: 5k full-words + 95k morphemes, OOV rate = [gr-dev09: 1.5%, gr-eval09: 1.4%], N-best error rate (NER) = [gr-dev09: 20.0%, gr-eval09: 18.8%]). . . 67 4.4 Recognition WERs [%] on Arabic ar-dev07 corpus using stream- and class-based LMs built

over words and morphemes (N-best size = 1000; word-based: 70k full-words, OOV rate

= 3.7%, N-best error rate (NER) = 9.5%; morpheme-based: 20k full-words + 50k mor-phemes, OOV rate = 1.4%, N-best error rate (NER) = 8.2%). . . 67 4.5 Recognition experiments on Arabic corpora using class-based LMs, factored LM (AR −

F LM4), and hierarchical Pitman-Yor LMs built over words (vocabulary: 750k full-words; OOV rate = [ar-dev07: 0.5%, ar-eval07: 0.7%]; N-best size = 1000; N-best error rate (NER) = [ar-dev07: 7.6%, ar-eval07: 9.1%]). . . 68 4.6 Word- and character-level perplexities on Arabic corpora for LMs that utilize word-level

classes (inv: perplexity for in-vocabulary text excluding the unk symbol; all: perplexity for the whole text including the unk symbol). . . 68 4.7 Recognition experiments on Arabic corpora using class-based LMs, factored LM (AR −

F LM4), and hierarchical Pitman-Yor LMs built over morphemes (vocabulary: 20k full-words + 236k morphemes; OOV rate = [ar-dev07: 0.5%, ar-eval07: 0.7%]; N-best size = 1000; N-best error rate (NER) = [ar-dev07: 7.6%, ar-eval07: 8.8%]). . . 69 4.8 Morpheme- and character-level perplexities on Arabic corpora for LMs that utilize

morpheme-level classes (inv: perplexity for in-vocabulary text excluding the unk symbol; all: per-plexity for the whole text including the unk symbol). . . 69 4.9 Number of instances of every class for Arabic vocabularies. . . 69 4.10 Recognition WERs [%] on German corpora using stream- and class-based LMs built over

words and morphemes (N-best size = 1000; word-based: 100k full-words, OOV rate = [gr-dev09: 5.0%, gr-eval09: 4.8%], N-best error rate (NER) = [gr-dev09: 23.6%, gr-eval09:

21.4%]; morpheme-based: 5k full-words + 95k morphemes, OOV rate = [gr-dev09: 1.5%, gr-eval09: 1.4%], N-best error rate (NER) = [gr-dev09: 20.0%, gr-eval09: 18.8%]). . . 70 4.11 Recognition experiments on German corpora using class-based LMs, factored LM (GR −

F LM5), and hierarchical Pitman-Yor LMs built over words (vocabulary: 750k full-words; OOV rate = [gr-dev09: 2.3%, gr-eval09: 2.1%]; N-best size = 1000; N-best error rate (NER) = [gr-dev09: 20.6%, gr-eval09: 18.9%]). . . 71 4.12 Word- and character-level perplexities on German corpora for LMs that utilize word-level

classes (inv: perplexity for in-vocabulary text excluding the unk symbol; all: perplexity for the whole text including the unk symbol). . . 71 4.13 Recognition experiments on German corpora using class-based LMs, factored LM (GR −

F LM5), and hierarchical Pitman-Yor LMs built over morphemes (vocabulary: 5k full-words + 495k morphemes; OOV rate = [gr-dev09: 0.9%, gr-eval09: 0.7%]; N-best size = 1000; N-best error rate (NER) = [gr-dev09: 19.1%, gr-eval09: 17.3%]). . . 72 4.14 Morpheme- and character-level perplexities on German corpora for LMs that utilize

morpheme-level classes (inv: perplexity for in-vocabulary text excluding the unk symbol; all: per-plexity for the whole text including the unk symbol). . . 72 4.15 Number of instances of every class for German vocabularies. . . 73 5.1 Recognition experiments on CallHome Egyptian colloquial Arabic (ECA) evaluation corpus

eca-eval using word-based neural network LMs (NNLMs) for lattice rescoring. vocabulary:

350k full-words, OOV rate = 1.4%, graph (lattice) error rate (GER) = 37.2%. . . 92

List of Tables

5.2 Recognition experiments on CallHome Egyptian colloquial Arabic (ECA) evaluation cor-pus eca-eval using morpheme-based neural network LMs (NNLMs) for lattice rescoring.

vocabulary: 250k (5k words + 245k morphemes), OOV rate = 0.9%, graph (lattice) error rate (GER) = 32.3%. . . 93 5.3 Word-/morpheme-level and character-level perplexities on CallHome Egyptian colloquial

Arabic (ECA) evaluation corpus eca-eval for different LMs (inv: perplexity for in-vocabulary text excluding the unk symbol; all: perplexity for the whole text including the unk symbol;

units: words or morphemes). . . 94 A.1 Experimental corpora for: modern standard Arabic, German, Polish, and Egyptian

collo-quial Arabic. BN: broadcast news; BC: broadcast conversation; EPPS: European parlia-ment plenary sessions; PC: Podcast; TC: Telephone Conversations. . . 103

Bibliography

M. Adda-Decker. A corpus-based decompounding algorithm for German lexical modeling in LVCSR. In Proc. European Conf. on Speech Communication and Technology, pages 257 – 260, Geneva, Switzerland, September 2003.

M. Adda-Decker and G. Adda. Morphological decomposition for ASR in German. In Workshop on Phonetics and Phonology in ASR, pages 129 – 143, Saarbr¨ucken, Germany, March 2000.

A. M. H. J. Aertsen, P. I. M. Johannesma, and D. J. Hermes. Spectro-temporal receptive fields of auditory neurons in the grassfrog. Biological Cybernetics, 38:235 – 248, November 1980.

M. Afify, L. Nguyen, B. Xiang, S. Abdou, and J. Makhoul. Recent progress in Arabic broadcast news transcription at BBN. In Proc. European Conf. on Speech Communication and Technology, volume 1, pages 1637 – 1640, Lisbon, Portugal, September 2005.

M. Afify, R. Sarikaya, H-K J. Kuo, L. Besacier, and Y. Gao. On the use of morphological analysis for dialectal Arabic speech recognition. In Interspeech, volume 1, pages 277 – 280, Pittsburgh, PA, USA, September 2006.

M. Afify, O. Siohan, and R. Sarikaya. Gaussian mixture language models for speech recognition. In Proc.

IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, volume 4, pages IV–29 – IV–32, Honululu, HI, USA, April 2007.

A. Alexandrescu and K. Kirchhoff. Factored neural language models. In Proc. Human Language Tech-nology Conf. of the North American Chapter of the ACL, NAACL-Short ’06, pages 1 – 4, Stroudsburg, PA, USA, 2006. Association for Computational Linguistics.

C. Allauzen, M. Mohri, B. Roark, and M. Riley. A generalized construction of integrated speech recognition transducers. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004.

P. Alleva, X. D. Huang, and M. Y. Hwang. Improvements on the pronunciation prefix tree search orga-nization. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, volume 1, pages 133 – 136, Atlanta, GA, USA, May 1996.

E. Arisoy, T. Sainath, B. Kingsbury, and B. Ramabhadran. Deep neural network language models. In NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, pages 20 – 28, Montreal, Canada, June 2012.

P. Auer, M. Herbster, and M. K. Warmuth. Exponentially many local minima for single neurons. In David S. Touretzky, Michael Mozer, and Michael E. Hasselmo, editors, Proc. Neural Information Pro-cessing Systems (NIPS) Foundation, pages 316 – 322. MIT Press, 1996.

L. R. Bahl, F. Jelinek, and R. L. Mercer. A maximum likelihood approach to continuous speech recognition.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 5:179 – 190, March 1983.

L. R. Bahl, P. F. Brown, P. V. de Souza, and R. L. Mercer. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pages 49 – 52, Tokyo, Japan, May 1986.

L. R. Bahl, M. Padmanabhan, D. Nahamoo, and P. S. Gopalakrishnan. Discriminative training of Gaussian mixture models for large vocabulary speech recognition systems. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pages 613 – 616, Atlanta, GA, USA, May 1996.

Appendix B Bibliography

J. K. Baker. Stochastic modeling for automatic speech understanding. In D. R. Reddy, editor, Speech Recognition, pages 512 – 542. Academic Press, New York, NY, USA, 1975.

R. Bakis. Continuous speech word recognition via centisecond acoustic states. In ASA Meeting, Wash-ington, DC, USA, April 1976.

M. C. Bateson. Arabic language handbook. Georgetown Classics in Arabic Language and Linguistics Series. Georgetown University Press, Portland, OR, USA, 2003.

L. E. Baum. An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In O. Shisha, editor, Inequalities, volume 3, pages 1 – 8. Academic Press, New York, NY, 1972.

T. Bayes. An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370 – 418, 1763. Reprinted in Biometrika, vol. 45, no. 3/4, pp. 293–315, December 1958.

I. Bazzi and J. R. Glass. Modeling out-of-vocabulary words for robust speech recognition. In Proc. Int.

Conf. on Spoken Language Processing, Beijing, China, October 2000.

T. C. Bell, J. G. Cleary, and I. H. Witten. Text compression. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1990. ISBN 0-13-911991-4.

J. Bellegarda. Large vocabulary speech recognition with multispan language models. IEEE Transactions on Speech and Audio Processing, 8(1):76 – 84, 2000.

R. E. Bellman. Dynamic programming. Princeton University Press, Princeton, NJ, USA, 1957.

Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1 – 127, January 2009.

Y. Bengio and R. Ducharme. A neural probabilistic language model. In Advances in Neural Information Processing Systems, volume 13, pages 932 – 938, 2001.

Y. Bengio and J.-S S´en´ecal. Quick training of probabilistic neural nets by importance sampling. In Conference on Artificial Intelligence and Statistics (AISTATS), 2003.

Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult.

Neural Networks, IEEE Transactions on, 5(2):157 – 166, 1994.

Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137 – 1155, March 2003.

A. Berton, P. Fetter, and P. Regal-Brietzmann. Compound words in large-vocabulary German speech recognition systems. In Proc. Int. Conf. on Spoken Language Processing, volume 2, pages 1165 – 1168, Philadelphia, PA, USA, October 1996.

K. Beulen, S. Ortmanns, and C. Elting. Dynamic programming search techniques for across-word modeling in speech recognition. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pages 609 – 612, Phoenix, AZ, March 1999.

P. Beyerlein. Discriminative model combination. In Proc. IEEE Automatic Speech Recognition and Understanding Workshop, pages 238 – 245, Santa Barbara, CA, USA, December 1997.

P. Beyerlein. Discriminative model combination. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pages 481 – 484, Seattle, WA, USA, May 1998.

J. Bilmes and K. Kirchhoff. Factored language models and generalized parallel backoff. In Proc. Hu-man Language Technology Conf. of the North American Chapter of the ACL, volume 2, pages 4 – 6, Edmonton, Canada, May 2003.

Appendix B Bibliography

J. Bilmes, K. Asanovic, C. Chee-Whye, and J. Demmel. Using PHiPAC to speed error back-propagation learning. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, volume 5, pages 4153 – 4156, Munich, Germany, 1997.

M. Bisani and H. Ney. Multigram-based grapheme-to-phoneme conversion for LVCSR. In Interspeech, pages 933 – 936, Geneva, Switzerland, September 2003.

M. Bisani and H. Ney. Bootstrap estimates for confidence intervals in ASR performance evaluation. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, volume 1, pages 409 – 412, Montreal, Canada, May 2004.

M. Bisani and H. Ney. Open vocabulary speech recognition with flat hybrid models. In Interspeech, pages 725 – 728, Lisbon, Portugal, September 2005.

M. Bisani and H. Ney. Joint-sequence models for grapheme-to-phoneme conversion. Speech Communica-tion, 50(5):434 – 451, May 2008.

C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, USA, 1 edition, January 1996. ISBN 9780198538646.

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993 – 1022, 2003.

M. Brand. Structure learning in conditional probability models via an Entropic prior and parameter extinction. Neural Computation, 11(5):1155 – 1182, 1999.

P. Brown, P. deSouza, R. Mercer, V. Della Pietra, and J. Lai. Class-based n-gram models of natural language. Computational linguistics, 18:467 – 479, 1992.

T. Buckwalter. Buckwalter Arabic Morphological Analyzer Version 2.0. Number LDC2004L02. Linguistic Data Consortium (LDC) catalogue, 2004. ISBN 1-58563-324-0.

W. Byrne, J. Hajiˇc, P. Ircing, P. Krbec, and J. Psutka. Morpheme based language models for speech recognition of Czech. In Text, Speech and Dialogue, volume 1902 of Lecture Notes in Computer Science, pages 139 –162. 2000.

M. Castro and F. Prat. New directions in connectionist language modeling. 2686:598 – 605.

M. J. Castro-Bleda, V. Polvoreda, and F. Prat. Connectionist n-gram models by using mlps. In Proceedings of the Second Workshop on Natural Language Processing and Neural Networks, pages 16 – 22, Tokyo, Japan, November 2001.

K. C¸ arki, P. Geutner, and T. Schultz. Turkish LVCSR: towards better speech recognition for agglutinative languages. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, volume 1, pages 3688 – 3691, Istanbul, Turkey, June 2000.

B. Chen, Q. Zhu, and N. Morgan. Learning long-term temporal features in LVCSR using neural networks.

In Interspeech, Jeju Island, Korea, October 2004.

S. F. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In Proc. Annual Meeting of the Association for Computational Linguistics, ACL ’96, pages 310 – 318, Stroudsburg, PA, USA, 1996. Association for Computational Linguistics.

S. F. Chen and R. Rosenfeld. A survey of smoothing techniques for ME models. Speech and Audio Processing, IEEE Transactions on, 8(1):37 – 50, jan 2000.

S. S. Chenand and P. S. Gopalakrishnan. Speaker, environment and channel change detection and cluster-ing via the Bayesian information criterion. In DARPA Broadcast News Transcription and Understandcluster-ing Workshop, pages 127 – 132, February 1998.

Appendix B Bibliography

G. Choueiter, D. Povey, S. F. Chen, and G. Zweig. Morpheme-based language modeling for Arabic

In document Sub-word based language modeling of morphologically rich languages for LVCSR (Page 119-144)