Future Work - AutoTC : Automatic Time Code recognition for the purpose of synchronisation of su

Future work will look at fine-tuning the time-code detection factors by using Genetic Algo- rithm in conjunction with the challenges that subtitling brings, for example: A minimum of four video frames should be left between subtitles to allow the viewers eye to register the appearance of a new subtitle[3] to further improve the accuracy of the overall system. To make the system more robust, the application will have to be tested in spoken environments which vary from clean (studio recordings) to noisy (outdoor recordings) with multiple speakers speaking at the same time, speech mixed with background music or sound effects and songs for karaoke purposes. The type of speech may differ from dictation (newsreader) to spontaneous (debate or interview). This will be followed by field-testing of the developed prototype system to further optimise and tune as per the Broadcasting requirements and standards.

References

[1] Alvarez, A. et al. (2010)APyCA: Towards the Automatic Subtitling of Television Con- tent in Spanish, Proceedings of the International Multiconference on Computer Science and Information Technology pp. 567-574.

[2] Liu, P. and Wang, Z. (2004). Voice Activity Detection using visual information, Pro- ceedings of the International Conference on Acoustics, Speech and Signal Processing. pp. 609-612.

[3] C. Mary and I. Jan (1998).Code of Good Subtitling Practice, Endorsed by the European Association for Studies in Screen Translation in Berlin on 17 October 1998.

[4] Pollak, P. et al. (1993) Noise System for a Car, Proceedings of the Third European Conference on Speech, Communication and Technology (EUROSPEECH), pp. 1073- 1076.

[5] Specification of the EBU Subtitling data exchange format European Broadcasting Union CH-1218 Grand Saconnex (Geneva) Switzerland TECH. 3264-E February 1991 [6] Prasad, R. V. et al. (2002) Comparison of Voice Activity Detection Algorithms for VoIP, Proceedings of the Seventh International Symposium on Computers and Com- munications (ISCC02)

[7] Sakhnov, K. et al. (2009). Approach for Energy-Based Voice Detector with Adaptive Scaling Factor, International Journal of Computer Science (IAENG).

[8] Kothari, B., et. al (2002) Chapter 13: Same Language Subtitling: A Butterfly for Literacy?, Reading Beyond the Alphabet. SAGE Publishing. pp. 213-229.

[9] McCall, W. (2008). Same-Language-Subtitling and Karaoke: The Use of Subtitled Music as a Reading Activity in a High School Special Education Classroom. In K. McFerrin et al. (Eds.), Proceedings of Society for Information Technology and Teacher Education International Conference 2008 (pp. 1190-1195). Chesapeake, VA: AACE.http://go.editlib.org/p/27350

[10] Biswas, Ranjita (2005). Hindi film songs can boost literacy rates in India

[11] Papineni, K. (2001).Bleu: a Method for Automatic Evaluation of Machine Translation, Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY 10598, September.

[12] Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics, In Proceedings of the Human Language Technology Conference, pp. 128132, San Diego, CA.

[13] Banerjee, S. and Lavie, A. (2005).METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments, In Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June.

[14] F. Liu, “The Voice Activity Detection (VAD) Recorder and VAD Network Recorder,”

Massey University, Auckland, New Zealand, 2001, pp. 151–158.

[15] D. Jurafsky and J. H. Martin, Speech and language processing: An introduction to natural language processing,computational linguistics and speech recognition, Pearson Education Inc., Upper Saddle River, New Jersey, 2009.

[16] Mazzoni, D. and Dannenberg, R. (2000). Audacity, Retrieved from http://audacity.sourceforge.net/download/

[17] Cetrus. 2010. Screen Subtitling Poliscript paygo—Powered by Cetrus, Retrieved from http://www.cetrus.com/product/Screen-Subtitling/Poliscript-paygo.

[18] Screen Subtitling Systems, Screen Subtitling, Retrieved from http://www.screen.subtitling.com/index.asp?PageID=7.

[19] Kirillov, A. 2008. “The AForge.NET Framework”AForge.NET Framework, Retrieved from http://www.aforgenet.com/framework/.

[20] Smith, R. 2011. “Google” tesseract-ocr An OCR Engine that was developed

at HP Labs between 1985 and 1995... and now at Google. , Retrieved from

http://code.google.com/p/tesseract-ocr/.

[21] Cook, C. 2001-2011. XML-RPC.NET, Retrieved from http://www.xml- rpc.net/download.html.

[22] Oulashin, E. April 20, 2009. C# WAV file class, audio mixing, and some light audio manipulation, Retrieved from http://www.codeproject.com/KB/audio- video/CSharpWAVClassAndMixing.aspx.

[23] Koehn, P. et al., Moses: Open Source Toolkit for Statistical Machine Translation.An- nual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic.

[24] Haddow, B.,Adding Multi-Threaded Decoding to Moses, The Prague Bulletin of Math- ematical Linguistics, NUMBER 93, 2010 pp. 5766.

[25] Kashyap, V. (2008, December). Statistical Machine Translation (SMT) in the field of subtitling of Hindi movies/songs. Paper presented part of the Student-Paper contest at the 6th International Conference on Natural Language Processing, CDAC, Pune, India. Retrieved from http://ltrc.iiit.ac.in/icon2008/Schedule08.html

[26] Ratcliff, J. (1999), Time Code: A User’s Guide, Oxford: Focal Press.

[27] A. Neumaier, ANSI/SMPTE 12M-1986, Cambridge University Press, Cambridge, 1990.

[28] Jaisinghani, B. (2008)English Movies, English Subtitles, Times News Network: Mum- bai, 27th December.

[29] Ramirez, J., et. al (2007) Voice Activity Detection. Fundamentals and Speech Recog- nition System Robustness, ISBN 987-3-90213-08-0, pp.460, I-Tech, Vienna, Austria, June 2007.

[30] Hu, Y. and Loizou, P. (2007). Subjective evaluation and comparison of speech

enhancement algorithms, Speech Communication, 49, 588-601. Retrieved from

http://www.utdallas.edu/ loizou/speech/noizeus/

[31] Wiegand. S and Weinkauf. T(2008). Texniccenter - Software (Version 1.0 Stable RC1). Available from http://www.texniccenter.org/resources/downloads

[32] Koehn, P. et al. (2007). Moses: Open Source Toolkit for Statistical Machine Transla- tion.Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic.

[33] YouTube (2010). The Future Will be Captioned: Improving Accessibility on

YouTube, Retrieved on March 8, 2010 from YouTube Website: http://youtube-

global.blogspot.com/2010/03/future-will-be-captioned-improving.html

[34] Massey University (2007). Virtual Eve: First in human-computer interaction. Auckland, N.Z.: Massey News. http://www.massey.ac.nz/massey/about- us/news/article.cfm?mnarticle=virtual-eve-first-in-human-computer-interaction- (accessed May 12, 2008). Archived at http://www.webcitation.org/5XHfBi3Zz.

[35] ETSI (2002). ETSI ES 202 050 Recommend. Speech Processing, Transmission, and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithms

[36] ITU-T Recommendation G.729-Annex B. (1996). A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70

Appendix A

Time-coding using Poliscript

In document AutoTC : Automatic Time Code recognition for the purpose of synchronisation of subtitles in the broadcasting of motion pictures using the SMPTE standard : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in (Page 69-75)