Acronyms - Efficient setup of acoustic models for large vocabulary continuous speech recognitio

ARISE Automatic Railway Information Systems for Europe ASR Automatic Speech Recognition

AT Automatic Transcription

CART Classification And Regression Tree

CMLLR Constrained Maximum Likelihood Linear Regression DAG Directed Acyclic Graph

DARPA Defense Advanced Research Projects Agency

EBW Extended Baum-Welch

ELRA European Language Resources Association EM Expectation Maximization

EPPS European Parliament Plenary Session FTE Final Text Edition

G2P Grapheme-To-Phoneme

GMM Gaussian Mixture Model

HMM Hidden Markov Model

IRIB Islamic Republic of Iran Broadcasting IRINN Islamic Republic of Iran News Network LDA Linear Discriminant Analysis

LIP Linear Integer Program

LM Language Model

LVCSR Large Vocabulary Continuous Speech Recognition MAP Maximum-a-Posteriori

MFCC Mel Frequency Cepstral Coefficients MLLR Maximum Likelihood Linear Regression

ML Maximum Likelihood

MMI Maximum Mutual Information

MPE Minimum Phoneme Error

MT Manual Transcription

NIST National Institute of Standards and Technology

OOV Out-Of-Vocabulary

PLP Perceptual Linear Prediction

PP Perplexity

ROVER Recognizer Output Voting Error Reduction RTE Rainbow Text Edition

RWTH Rheinisch Westfälische Technische Hochschule SAMPA Speech Assessment Methods Phonetic Alphabet SAT Speaker Adaptive Training

SC-HMM Semi-Continuous Hidden Markov Model

SI Speaker Independent

SPEX Speech Processing Expertise Center

TC-STAR Technology and Corpora for Speech to Speech Translation UPC Universitat Politècnica de Catalunya

UT Unsupervised Training

VTLN Vocal Tract Length Normalization

WER Word Error Rate

Bibliography

[Abdoua & Hamid⁺ 06] S. M. Abdoua, S. E. Hamid, M. Rashwan, A. Samir, O. Abd-Elhamid, M. Shahin, W. Nazih. Computer Aided Pronunciation Learning System Using Speech Recognition Techniques. In Proc. Int. Conf. on Spoken Language Processing, pp. 849 – 852, Pittsburgh, PA, USA, Sept. 2006.

[Anastasakos & Balakrishnan 98] T. Anastasakos, S. Balakrishnan. The Use of Con-fidence Measures in Unsupervised Adaptation of Speech Recognizers. In Proc. Int.

Conf. on Spoken Language Processing, pp. 2303–2306, Sydney, Australia, Dec. 1998.

[Bahl & Baker⁺ 76] L. R. Bahl, J. Baker, P. Cohen, N. Dixon, F. Jelinek, R. Mercer, H. Silverman. Preliminary results on the performance of a system for the automatic recognition of continuous speech. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 425–429, Philadelphia, PA, USA, April 1976.

[Bahl & Brown⁺ 86] L. R. Bahl, P. Brown, P. V. de Souza, R. Mercer. Maximum mu-tual information estimation of hidden Markov model parameters for speech recog-nition. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp.

49–52, Tokyo, Japan, May 1986.

[Bahl & de Souza⁺ 91] L. R. Bahl, P. V. de Souza, P. S. Gopalakrishnan, D. Nahamoo, M. A. Picheny. Context Dependent Modeling of Phones in Continuous Speech Using Decision Trees. In Proc. DARPA Speech and Natural Language Processing Workshop, pp. 264–270, Pacific Grove, CA, USA, Feb. 1991. Morgan Kaufmann.

[Baker & Deng⁺ 07] J. Baker, L. Deng, S. Khudanpur, C.-H. Lee, J. Glass, N. Mor-gan. Historical Development and Future Directions in Speech Recognition and Un-derstanding. MINDS Report of the Speech Understanding Working Group, 2007.

[Baker 75] J. Baker. The DRAGON system – An overview. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 23, No. 1, pp. 24 – 29, Feb. 1975.

[Bakis 76] R. Bakis. Continuous speech word recognition via centisecond acoustic states. In Proc. ASA Meeting, Washington, DC, USA, April 1976.

[Barras & Geoffrois⁺ 98] C. Barras, E. Geoffrois, Z. Wu, M. Liberman. Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In First Int. Conf. on Language Resources and Evaluation (LREC), pp. 1373–1376, 1998.

[Barras & Geoffrois⁺ 01] C. Barras, E. Geoffrois, Z. Wu, M. Liberman. Transcriber:

development and use of a tool for assisting speech corpora production. Speech Com-munication, Vol. 33, No. 1–2, 2001.

[Basharin & Langville⁺ 03] G. P. Basharin, A. N. Langville, V. A. Naumov. The Life and Work of A. A. Markov. In Proc. Int. Conf. on the Numerical Solution of Markov Chains, pp. 1–22, Urbana, IL, USA, Sept. 2003.

[Baum & Petrie 66] L. E. Baum, T. Petrie. Statistical Inference for Probabilistic Func-tions of Finite State Markov Chains. The Annals of Mathematical Statistics, Vol. 37, No. 6, pp. 1554–1563, 1966.

[Baum 72] L. E. Baum. An equality and associated maximization technique in statis-tical estimation for probabilistic functions of markov processes. Inequalities, Vol. 3, pp. 1–8, 1972.

[Bayes 63] T. Bayes. An essay towards solving a problem in the doctrine of chances.

Philosophical Transactions of the Royal Society of London, Vol. 53, pp. 370–418, 1763. Reprinted in Biometrika, vol. 45, no. 3/4, pp. 293–315, December 1958.

[Bellman 57] R. E. Bellman. Dynamic programming. Princeton University Press, Princeton, NJ, USA, 1957.

[Beulen & Ortmanns⁺ 99] K. Beulen, S. Ortmanns, C. Elting. Dynamic programming search techniques for across-word modeling in speech recognition. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 609–612, Phoenix, AZ, USA, March 1999.

[Beyerlein & Aubert⁺ 02] P. Beyerlein, X. Aubert, R. Haeb-Umbach, M. Harris, D. Klakow, A. Wendemuth, S. Molau, H. Ney, M. Pitz, A. Sixtus. Large Vocabulary Continuous Speech Recognition of Broadcast News – The Philips/RWTH Approach.

Speech Communication, Vol. 37, No. 1/2, pp. 109–131, May 2002.

[Bijankhan & Sheikhzadegan⁺ 94] M. Bijankhan, J. Sheikhzadegan, M. Roohani, Y. Samareh, K. Lucass, M. Tabiani. FARSDAT-The Speech Database of Farsi Spo-ken Language. In Fifth Australian Int. Conf. on Speech Science and Technology (SST-94), pp. 826–831, Perth, Australia, Dec. 1994.

[Bisani & Ney 02] M. Bisani, H. Ney. Investigations on Joint-Multigram Models for Grapheme-to-Phoneme Conversion. In Proc. Int. Conf. on Spoken Language Pro-cessing, pp. 105–108, Denver, CO, USA, Sept. 2002.

[Bisani & Ney 03] M. Bisani, H. Ney. Multigram-based Grapheme-to-Phoneme Con-version for LVCSR. In Proc. European Conf. on Speech Communication and Tech-nology, Vol. 2, pp. 933 – 936, Geneva, Switzerland, Sept. 2003.

[Bisani & Ney 08] M. Bisani, H. Ney. Joint-Sequence Models for Grapheme-to-Phoneme Conversion. Speech Communication, Vol. 50, No. 5, pp. 434 – 451, 2008.

[Breiman & Friedman⁺ 84] L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone.

Classification and Regression Trees. Wadsworth International Group, Belmont, CA, USA, 1984.

[Brown 87] P. F. Brown. The Acoustic-Modelling Problem in Automatic Speech Recog-nition. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 1987.

[Chelba & Xu⁺ 12] C. Chelba, P. Xu, F. Pereira, T. Richardson. Distributed Acoustic Modeling with Back-Off N-grams. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Kyoto, Japan, March 2012.

[Chen & Goodman 99] S. F. Chen, J. Goodman. An empirical study of smoothing techniques for language modeling. Computer Speech and Language, Vol. 13, No. 4, pp. 359 – 394, Oct. 1999.

[Chen & Lamel⁺ 04] L. Chen, L. Lamel, J.-L. Gauvain. Lightly Supervised Acoustic Model Training Using Consensus Networks. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 189 – 192, Montreal, Canada, May 2004.

[Cohen & Mercer 75] P. S. Cohen, R. L. Mercer. The Phonological Component of an Automatic Speech Recognition System. pp. 275–320, New York, NY, USA, 1975.

Academic Press.

[Davis & Mermelstein 80] S. Davis, P. Mermelstein. Comparison of parametric repre-sentations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 28, No. 4, pp. 357–

366, Aug. 1980.

[Demenko & Grocholewski⁺ 08] G. Demenko, S. Grocholewski, K. Klessa, J. Ogorkiewicz, A. Wagner, M. Lange, D. Sledzinski, N. Cylwik. JURISDIC - Polish speech database for taking dictation of legal texts. In Sixth Int. Conf. on Language Resources and Evaluation (LREC), pp. 1280–1287, 2008.

[Dempster & Laird⁺ 77] A. Dempster, N. Laird, D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. In Journal of the Royal Statistical Society, Vol. 39:(1) of Series B, pp. 1–38, 1977.

[den Os & Boves⁺ 99] E. den Os, L. Boves, L. Lamel, P. Baggia. Overview of the ARISE Project. In Proc. Int. Conf. on Spoken Language Processing, pp. 1527 – 1530, Budapest, Hungary, Sept. 1999.

[Digalakis & Rtischev⁺ 95] V. Digalakis, D. Rtischev, L. Neumeyer. Speaker Adapta-tion Using Constrained ReestimaAdapta-tion of Gaussian Mixtures. IEEE TransacAdapta-tions on Speech and Audio Processing, Vol. 3, No. 5, pp. 357–366, 1995.

[Duchateau & Demuynck⁺ 98] J. Duchateau, K. Demuynck, D. V. Compernolle. Fast and accurate acoustic modelling with semi-continuous HMMs. Speech Communica-tion, Vol. 24, No. 1, pp. 5–17, April 1998.

[Duda & Hart⁺ 01] R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification. John Wiley & Sons, New York, NY, USA, 2001.

[Esposito & Aversano 04] A. Esposito, G. Aversano. Text Independent Methods for Speech Segmentation. In Summer School on Neural Networks, pp. 261–290, 2004.

[Evermann & Chan⁺ 05] G. Evermann, H. Y. Chan, M. J. F. Gales, B. Jia, D. Mrva, P. C. Woodland, K. Yu. Training LVCSR Systems on Thousands of Hours of Data.

In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. 1, pp.

209 – 212, Philadelphia, PA, USA, March 2005.

[Evermann & Woodland 00] G. Evermann, P. C. Woodland. Large Vocabulary Decod-ing and Confidence Estimation usDecod-ing Word Posterior Probabilities. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 1655 – 1658, Istanbul, Turkey, June 2000.

[Fiscus 97] J. G. Fiscus. A post-processing system to yield reduced word error rates:

Recognizer Output Voting Error Reduction (ROVER). In Proc. IEEE Automatic Speech Recognition and Understanding Workshop, pp. 347–354, Santa Barbara, CA, USA, Dec. 1997.

[Fisher 36] R. A. Fisher. The use of multiple measurements in taxonomic problems.

Annals of Eugenics, Vol. 7, pp. 179–188, 1936.

[Gales & Young 07] M. J. F. Gales, S. J. Young. The application of Hidden Markov Models in speech Recognition, Vol. 1:(3). Foundations and Trends in Signal Process-ing, 2007.

[Gales 98] M. Gales. Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language, Vol. 12, No. 2, pp. 75–98, 1998.

[Gauvain & Lamel⁺ 99] J.-L. Gauvain, L. Lamel, G. Adda, M. Jardino. Recent ad-vances in transcribing television and radio broadcasts. In Proc. European Conf. on Speech Communication and Technology, Vol. 2, pp. 655–658, Budapest, Hungary, Sept. 1999.

[Gauvain & Lee 94] J. Gauvain, C. Lee. Maximum a-posteriori estimation for mul-tivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 2, pp. 291–298, 1994.

[Giuliani & Brugnara 06] D. Giuliani, F. Brugnara. Acoustic model adaptation with multiple supervision. In TC-STAR Workshop on Speech-to-Speech Translation, pp.

151–154, Barcelona, Spain, June 2006.

[Gollan & Bacchiani 08] C. Gollan, M. Bacchiani. Confidence Scores for Acoustic Model Adaptation. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Pro-cessing, pp. 4289–4292, Las Vegas, NV, USA, April 2008.

[Gollan & Bisani⁺ 05] C. Gollan, M. Bisani, S. Kanthak, R. Schlüter, H. Ney. Cross Domain Automatic Transcription on the TC-STAR EPPS Corpus. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 825 – 828, Philadelphia, PA, USA, March 2005.

[Gollan & Hahn⁺ 07] C. Gollan, S. Hahn, R. Schlüter, H. Ney. An Improved Method for Unsupervised Training of LVCSR Systems. In Proc. European Conf. on Speech Communication and Technology, pp. 2101–2104, Antwerp, Belgium, Aug. 2007.

[Gollan & Ney 08] C. Gollan, H. Ney. Towards Automatic Learning in LVCSR: Rapid Development of a Persian Broadcast Transcription System. In Proc. Int. Conf. on Spoken Language Processing, pp. 1441–1444, Brisbane, Australia, Sept. 2008.

[Gopalakrishnan & Kanevsky⁺ 91] P. S. Gopalakrishnan, D. Kanevsky, A. Nadas, D. Nahamooy. An Inequality for Rational Functions with Applications to Some Sta-tistical Estimation Problems. IEEE Transactions on Information Theory, Vol. 37, pp. 107–113, 1991.

[Hermansky 90] H. Hermansky. Perceptual linear predictive (PLP) analysis of speech.

Journal of the Acoustical Society of America, Vol. 87, No. 4, pp. 1738–1752, June 1990.

[Hoffmeister & Hillard⁺ 07] B. Hoffmeister, D. Hillard, S. Hahn, R. Schlüter, M. Os-tendorf, H. Ney. Cross-site and intra-site asr system combination: Comparisons on lattice and 1-best methods. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 1145–1148, Honolulu, HI, USA, April 2007.

[Hoffmeister & Klein⁺ 06] B. Hoffmeister, T. Klein, R. Schlüter, H. Ney. Frame based system combination and a comparison with weighted ROVER and CNC. In Proc.

Int. Conf. on Spoken Language Processing, pp. 1523–1526, Pittsburgh, PA, USA, Sept. 2006.

[Huang & Jack 89] X. Huang, M. Jack. Semi-continuous hidden Markov models for speech signals. Computer Speech and Language, Vol. 3, No. 3, pp. 239–251, July 1989.

[Huang & Learned-Miller⁺ 07] G. Huang, E. Learned-Miller, A. McCallum. Cryp-togram Decoding for OCR Using Numerization Strings. In Proc. IEEE Int. Conf.

on Document Analysis and Recognition, pp. 208–212, Curitiba, Paraná, Brazil, Sept.

2007.

[Hughes & Nakajima⁺ 10] T. Hughes, K. Nakajima, L. Ha, A. Vasu, P. J. Moreno, M. LeBeau. Building transcribed speech corpora quickly and cheaply for many languages. In Proc. Int. Conf. on Spoken Language Processing, pp. 1914–1917, Makuhari, Chiba, Japan, Sept. 2010.

[Jelinek & Bahl⁺ 75] F. Jelinek, L. R. Bahl, R. Mercer. Design of a Linguistic Sta-tistical Decoder for the Recognition of Continuous Speech. IEEE Transactions on Information Theory, Vol. 21, No. 3, pp. 250–256, May 1975.

[Juang & Rabiner 06] B. H. Juang, L. R. Rabiner. Automatic Speech Recognition – A Brief History of the Technology. Encyclopedia of Language and Linguistics, Elsevier, 2nd edition, 2006.

[Kanthak & Ney 01] S. Kanthak, H. Ney. Context-Dependent Acoustic Modeling Us-ing Graphemes for Large Vocabulary Speech Recognition. In Proc. IEEE Int. Conf.

on Acoustics, Speech, and Signal Processing, pp. 845–848, Salt Lake City, UT, USA, May 2001.

[Kemp & Schaaf 97] T. Kemp, T. Schaaf. Estimating Confidence Using Word Lattices.

In Proc. European Conf. on Speech Communication and Technology, pp. 827–830, Rhodes, Greece, 1997.

[Kemp & Waibel 99] T. Kemp, A. Waibel. Unsupervised Training of a Speech Recog-nizer: Recent Experiments. In Proc. European Conf. on Speech Communication and Technology, pp. 2725–2728, Budapest, Hungary, Sept. 1999.

[Kenstowicz 94] M. J. Kenstowicz. Phonology in Generative Grammar. Wiley-Blackwell, 13, 1994.

[Knight & Nair⁺ 06] K. Knight, A. Nair, N. Rathod, K. Yamada. Unsupervised Anal-ysis for Decipherment Problems. In Proc. ACL-COLING, 2006.

[Knight & Yamada 99] K. Knight, K. Yamada. A Computational Approach to Deci-phering Unknown Scripts. In Proceedings of the ACL Workshop on Unsupervised Learning in Natural Language Processing, 1999.

[Kolmogorov 33] A. N. Kolmogorov. Grundbegriffe der Wahrscheinlichkeitrechnung.

Springer, Berlin, Germany, 1933. An English translation appeared under the title Foundations of the Theory of Probability (Chelsea, New York) in 1950, with a second edition in 1956.

[Lamel & Gauvain⁺ 00] L. Lamel, J.-L. Gauvain, G. Adda. Lightly supervised acoustic model training. In Proc. ISCA Automatic Speech Recognition Workshop, pp. 150–

154, Paris, France, Sept. 2000.

[Lamel & Gauvain⁺ 02] L. Lamel, J.-L. Gauvain, G. Adda. Unsupervised acoustic model training. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Process-ing, pp. 877–880, Orlando, FL, USA, May 2002.

[LC-STAR] LC-STAR. Lexica and Corpora for Speech-to-Speech Translation Compo-nents. http://www.lc-star.com.

[Lee & Hayamizu⁺ 90] K. F. Lee, S. Hayamizu, H. W. Hon, C. Huang, J. Swartz, R. Weide. Allophone Clustering for Continuous Speech Recognition. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 749–752, Albuquerque, NM, USA, April 1990.

[Lee 02] D.-S. Lee. Substitution Deciphering Based on HMMs with Applications to Compressed Document Processing. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24, No. 12, pp. 1661–1666, Dec. 2002.

[Leggetter & Woodland 95a] C. Leggetter, P. Woodland. Acoustic model adaptation with multiple supervision. In ARPA Spoken Language Technology Workshop, pp.

104–109, Austin, TX, USA, Jan. 1995.

[Leggetter & Woodland 95b] C. J. Leggetter, P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models.

Computer Speech and Language, Vol. 9, No. 2, pp. 171–185, 1995.

[Lööf & Bisani⁺ 06] J. Lööf, M. Bisani, C. Gollan, G. Heigold, B. Hoffmeister, C. Plahl, R. Schlüter, H. Ney. The 2006 rwth parliamentary speeches transcrip-tion system. In Proc. Int. Conf. on Spoken Language Processing, pp. 105 – 108, Pittsburgh, PA, USA, Sept. 2006.

[Lööf & Gollan⁺ 07] J. Lööf, C. Gollan, S. Hahn, G. Heigold, B. Hoffmeister, C. Plahl, D. Rybach, R. Schlüter, H. Ney. The RWTH 2007 TC-STAR evaluation system for European English and Spanish. In Proc. Int. Conf. on Spoken Language Processing, pp. 2145–2148, Antwerp, Belgium, Aug. 2007.

[Lööf & Gollan⁺ 08] J. Lööf, C. Gollan, H. Ney. Speaker Adaptive Training Using Shift-MLLR. In Proc. Int. Conf. on Spoken Language Processing, pp. 1701–1704, Brisbane, Australia, Sept. 2008.

[Lööf & Gollan⁺ 09] J. Lööf, C. Gollan, H. Ney. Cross-language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System. In Proc. Int. Conf. on Spoken Language Processing, pp. 88–91, Brighton, UK, Sept. 2009.

[Lougee-Heimer 03] R. Lougee-Heimer. The Common Optimization INterface for Op-erations Research. In IBM Journal of Research and Development, Vol. 47:(1), pp.

57–66, Jan. 2003.

[Ma & Matsoukas⁺ 06] J. Ma, S. Matsoukas, O. Kimball, R. Schwartz. Unsupervised Training on Large Amounts of Broadcast News Data. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. 3, pp. 1057 – 1059, Toulouse, France, May 2006.

[Ma & Schwartz 08] J. Ma, R. Schwartz. Unsupervised versus Supervised Training of Acoustic Models. In Proc. Int. Conf. on Spoken Language Processing, pp. 2374 – 2377, Brisbane, Australia, Sept. 2008.

[Macherey & Bender⁺ 03] K. Macherey, O. Bender, H. Ney. Multi-Level Error Han-dling for Tree Based Dialogue Course Management. In ISCA Tutorial and Research Workshop on Error Handling in Spoken Dialogue Systems, pp. 123–128, Chateau-d’Oex-Vaud, Switzerland, Aug. 2003.

[Marasek 07] K. Marasek. Polish LVCSR in the Janus system. Preliminary results for the SpeeCon database. Archives of Acoustics, Vol. 32, No. 1, pp. 119–126, 2007.

[Moore 03] R. K. Moore. A Comparison of the Data Requirements of Automatic Speech Recognition Systems and Human Listeners. In Proc. European Conf. on Speech Communication and Technology, pp. 2582 – 2584, Geneva, Switzerland, Sept. 2003.

[Mori 79] R. D. Mori. Recent advances in automatic speech recognition. Signal Pro-cessing, Vol. 1, No. 2, pp. 95 – 123, 1979.

[Ney & Ortmanns 99] H. Ney, S. Ortmanns. Dynamic programming search for contin-uous speech recognition. IEEE Signal Processing Magazine, Vol. 16, No. 5, pp. 64–83, Sept. 1999.

[Ney 84] H. Ney. The use of a one-stage dynamic programming algorithm for connected word recognition. IEEE Transactions on Speech and Audio Processing, Vol. 32, No. 2, pp. 263–271, April 1984.

[Ney 90] H. Ney. Acoustic modeling of phoneme units for continuous speech recog-nition. In Signal Processing V: Theories and Applications, Fifth European Signal Processing Conference, pp. 65–72, Barcelona, Spain, Sept. 1990. Elsevier Science Publishers B. V.

[Ney 07] H. Ney. Introduction to Automatic Speech Recognition. Technical report, RWTH Aachen University, Aachen, Germany, 2007. Lecture script.

[Nguyen & Matsoukas⁺ 99] L. Nguyen, S. Matsoukas, J. Davenport, D. Liu, J. Billa, F. Kubala, J. Makhoul. Further advances in transcription of broadcast news. In Proc. European Conf. on Speech Communication and Technology, Vol. 2, pp. 667–

670, Budapest, Hungary, Sept. 1999.

[Nguyen & Xiang 04] L. Nguyen, B. Xiang. Light Supervision in Acoustic Model Train-ing. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp.

185–188, March 2004.

[Normandin & Morgera 91] Y. Normandin, S. Morgera. An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition.

In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 537–540, Toronto, Canada, May 1991.

[Oerder & Ney 93] M. Oerder, H. Ney. Word graphs: an efficient interface between continuous-speech recognition and language understanding. In Proc. IEEE Int. Conf.

on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 119 – 122, Minneapolis, MN, USA, April 1993.

[Ogata & Ariki 02] J. Ogata, Y. Ariki. Unsupervised acoustic model adaptation based on phoneme error minimization. In Proc. Int. Conf. on Spoken Language Processing, pp. 1429–1432, Denver, CO, USA, Sept. 2002.

[Oliver 98] D. Oliver. Polish text to speech synthesis. In Master’s thesis, Edinburgh University, Edinburgh, UK, 1998.

[Ono & Wakita⁺ 93] Y. Ono, H. Wakita, Y. Zhao. Speaker Normalization Using Con-strained Spectra Shifts in Auditory Filter Domain. In Proc. European Conf. on Speech Communication and Technology, pp. 355–358, Berlin, Germany, 1993.

[Oommen & Zgierski 93] B. J. Oommen, J. R. Zgierski. Breaking Substitution Cyphers Using Stochastic Automata. In IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 15, pp. 185 – 192, 1993.

[Oroumchian & Darrudi⁺ 04] F. Oroumchian, E. Darrudi, M. Hejazi. Assessment of a Modern Farsi Corpus. In 2nd Workshop on Information Technology and its Disci-plines (WITID), Kish Island, Iran, Feb. 2004.

[Padmanabhan & Saon⁺ 00] M. Padmanabhan, G. Saon, G. Zweig. Lattice-Based Un-supervised MLLR For Speaker Adaptation. In Proc. ISCA ITRW Automatic Speech Recognition: Challenges for the Millenium, Paris, France, 2000.

[Park & Glass 08] A. S. Park, J. R. Glass. Unsupervised Pattern Discovery in Speech.

IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 16, No. 1, pp. 186–197, Jan. 2008.

[Peleg & Rosenfeld 79] S. Peleg, A. Rosenfeld. Breaking Substitution Ciphers Using a Relaxation Algorithm. In Communications of the ACM, Vol. 22:(11), pp. 598–605, 1979.

[Pitz & Wessel⁺ 00] M. Pitz, F. Wessel, H. Ney. Improved MLLR speaker adaptation using confidence measures for conversational speech recognition. In Proc. Int. Conf.

on Spoken Language Processing, pp. 548–551, Beijing, China, Oct. 2000.

[Pitz 05] M. Pitz. Investigations on Linear Transformations for Speaker Adaptation and Normalization. Ph.D. thesis, RWTH Aachen University, Aachen, Germany, 2005.

[Povey & Woodland 02] D. Povey, P. C. Woodland. Minimum phone error and I-smoothing for improved discriminative training. In Proc. IEEE Int. Conf. on Acous-tics, Speech, and Signal Processing, pp. 105–108, Orlando, FL, USA, May 2002.

[Rabiner & Schafer 78] L. R. Rabiner, R. W. Schafer. Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs, NJ, USA, 1978.

[Ramabhadran 05] B. Ramabhadran. Exploiting large quantities of spontaneous speech for unsupervised training of acoustic models. In Proc. Int. Conf. on Spo-ken Language Processing, pp. 1617–1620, Lisbon, Portugal, Sept. 2005.

[Ravi & Knight 08] S. Ravi, K. Knight. Attacking Decipherment Problems Optimally with Low-Order N-gram Models. In Proc. EMNLP, 2008.

[Roy & Pentland 02] D. K. Roy, A. P. Pentland. Learning words from sights and sounds: a computational model. Cognitive Science: A Multidisciplinary Journal, Vol. 26, No. 1, pp. 113–146, Jan. 2002.

[Rybach & Gollan⁺ 09a] D. Rybach, C. Gollan, G. Heigold, B. Hoffmeister, J. Lööf, R. Schlüter, H. Ney. The RWTH Aachen University Open Source Speech Recog-nition System. In Proc. Int. Conf. on Spoken Language Processing, pp. 2111–2114, Brighton, U.K., Sept. 2009.

[Rybach & Gollan⁺ 09b] D. Rybach, C. Gollan, R. Schlüter, H. Ney. Audio Segmen-tation for Speech Recognition using Segment Features. In Proc. IEEE Int. Conf.

on Acoustics, Speech, and Signal Processing, pp. 4197–4200, Taipei, Taiwan, April 2009.

[Rybach & Hahn⁺ 07] D. Rybach, S. Hahn, C. Gollan, R. Schlüter, H. Ney. Advances in Arabic Broadcast News Transcription at RWTH. In Proc. IEEE Automatic Speech Recognition and Understanding Workshop, pp. 449–454, Kyoto, Japan, Dec. 2007.

[Sankar & Lee 96] A. Sankar, C.-H. Lee. A Maximum-Likelihood Approach to Stochas-tic Matching for Robust Speech Recognition. IEEE Transactions on Speech and Audio Processing, Vol. 4, No. 3, pp. 190–202, May 1996.

[Schultz & Waibel 97] T. Schultz, A. Waibel. Fast Bootstrapping of LVCSR Systems with Multilingual Phoneme Sets. In Proc. European Conf. on Speech Communication and Technology, pp. 371–374, Rhodes, Greece, 1997.

[Schultz & Waibel 01] T. Schultz, A. Waibel. Language Independent and Language Adaptive Acoustic Modeling for Speech Recognition. Speech Communication, Vol. 35, pp. 31–51, Aug. 2001.

[Schwartz & Chow⁺ 85] R. Schwartz, Y. Chow, O. Kimball, S. Roucos, M. Krasnet, J. Makhoul. Context-Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 1205–1208, Tampa, FL, USA, March 1985.

[Shan & Wu⁺ 10] J. Shan, G. Wu, Z. Hu, X. Tang, M. Jansche, P. J. Moreno. Search by voice in Mandarin Chinese. In Proc. Int. Conf. on Spoken Language Processing, pp. 354–357, Makuhari, Chiba, Japan, Sept. 2010.

[Shannon 49] C. E. Shannon. Communication Theory of Secrecy Systems. In Bell System Technical Journal, Vol. 28, pp. 656–715, 1949.

[Shannon 51] C. E. Shannon. Prediction and Entropy of Printed English. Bell System Technical Journal, Vol. 30, pp. 50–64, 1951.

[Siegler & Jain⁺ 97] M. Siegler, U. Jain, B. Raj, R. Stern. Automatic Segmentation, Classification and Clustering of Broadcast News Audio. In DARPA Speech Recogni-tion Workshop, pp. 97 – 99, 1997.

[Sixtus & Molau⁺ 00] A. Sixtus, S. Molau, S. Kanthak, R. Schlüter, H. Ney. Re-cent Improvements of the RWTH Large Vocabulary Speech Recognition System on Spontaneous Speech. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 1671–1674, June 2000.

[Sloboda & Waibel 96] T. Sloboda, A. Waibel. Dictionary Learning for Spontaneous Speech Recognition. In Proc. Int. Conf. on Spoken Language Processing, pp. 2328–

2331, Philadelphia, PA, USA, Oct. 1996.

[Stolcke 02] A. Stolcke. SRILM - An Extensible Language Modeling Toolkit. In Proc.

Int. Conf. on Spoken Language Processing, pp. 901–904, Sept. 2002.

[Strope & Beeferman⁺ 11] B. Strope, D. Beeferman, A. Gruenstein, X. Lei. Unsuper-vised Testing Strategies for ASR. In Proc. Int. Conf. on Spoken Language Processing, pp. 1685–1688, Florence, Italy, Aug. 2011.

[Sundaram & Picone 04] R. Sundaram, J. Picone. Effects on Transcription Errors on Supervised Learning in Speech Recognition. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 169–172, March 2004.

[Uebel & Woodland 01] L. F. Uebel, P. C. Woodland. Improvements in linear

In document Efficient setup of acoustic models for large vocabulary continuous speech recognition (Page 106-124)