The NER Model
3.5 Final thoughts: the NER model and automatic subtitling
As mentioned above, the NER model can also be used to measure the quality of automatic subtitles, that is, those produced with ASR (automatic speech recognition). Given the rapid evolution of speaker- independent speech recognition, it makes sense to anticipate the use of this technology by subtitling companies once it has reached optimum levels of accuracy. First of all, this software may be used with the inter-vention of a human operator, who will correct misrecognitions and errors of punctuation and speaker identification before sending the subtitles on air. Perhaps, in a more distant future, human intervention may even be excluded from the process altogether. Be that as it may, it is important to ensure that subtitles are at least as accurate as those produced by respeaking, in this case reaching 98 per cent with the NER model, including punctuation and character identification errors.
There is, however, one more element that becomes critical when dealing with automatic subtitles: speed. As highlighted in Romero- Fresco (2012), the speed of subtitles has a direct impact on the amount of time viewers can devote to the images. According to eye- tracking data obtained in Poland, the UK and Spain in the DTV4ALL project, and in South Africa by Hefer (2011), a speed of 150 wpm leads to an average distribution of 50 per cent of the time on the subtitles and 50 per cent on the images. A faster speed of 180 wpm yields an average of 60– 65 per cent of the time on the subtitles and 40– 35 per cent on the images, whereas 200 wpm only allows 20 per cent of the time on the images. As shown by González Lago (2011), the average speech rate of live programmes, such as the Spanish news, is 240– 278 wpm, with peaks of 400 wpm and even 600 wpm in certain cases. Speech rates of over 220 wpm are also very common for presenters in the UK (Eugeni 2009). Considering the fact that presenters are unlikely to slow down their speech rates and that automatic subtitles are by definition verba-tim, the speed of automatic subtitles is likely to cause viewers to miss most of the images, unless: a) human intervention before launching the
48 Pablo Romero-Fresco and Juan Martínez Pérez
subtitles also includes editing, which is very complex and could lead to prolonged delays; b) an antenna delay is implemented so that the editor can have time to correct errors and edit the subtitles;4 or c) the technol-ogy used allows settings to be defined in order to achieve an optimum display mode/exposure time by automatically calculating a maximum and minimum duration.
3.6 Conclusions
Following a brief description of some of the traditional models used to assess accuracy in speech recognition and respeaking, the NER model has been introduced in this article in an attempt to provide a functional and easy- to- apply model to assess the accuracy of live subtitles, while also providing data on important subtitling factors such as delay, posi-tion, speed and character identification. The division between editing and recognition errors, not only provides an indication of the accuracy rate of subtitles, but also gives us an idea of what must be improved and how.
The model has been adopted by Ofcom to assess the quality of live subtitles on UK television (Ofcom 2013).5 Published in April 2014, the first Ofcom report on the quality of live subtitling measured using the NER model has helped to dispel one of the most common concerns regarding the application of quantitative and qualitative measures used to assess the quality of live subtitles, namely, the existence of discrepan-cies and subjective evaluations, especially when it comes to analysing the loss of information in the subtitles and the impact this might con-ceivably have on viewers. The application of the NER model in the UK has proved very consistent according to the internal reviewers from the different broadcasters and subtitling companies (who were only given a few written instructions as to how to apply the model) as well as the external reviewers from the University of Roehampton (London). The average discrepancy with regard to the accuracy rates of the 66 pro-grammes analysed was 0.09 per cent (Ofcom 2014).
The NER model has also been endorsed by a white paper pub-lished by Media Access Australia (2014) and has been included in the official Spanish guidelines on subtitling for the deaf and the hard- of- hearing (AENOR 2012). It is also being used by broadcasters, companies and training institutions in, amongst others, Spain, France, Italy, Switzerland, Germany, Belgium and Australia, as well as by the EU- funded projects, SAVAS (www. fp7- savas.eu) and HBB4ALL (www.
hbb4all.eu). Furthermore, NERstar, a semi- automatic tool, has been
Accuracy Rate in Live Subtitling: The NER Model 49 developed to ensure a quick and effective application of the NER model to live subtitles produced by respeaking or by ASR.6
Ten years after the introduction of respeaking, the collaboration between academia and the subtitling industry seems to be yielding interesting results, not least a set of parameters to ensure a certain degree of quality in respeaking, which in this case could attain a 98 per cent accuracy rate with the NER model, a block display mode for sub-titles ( Romero- Fresco 2011) and a maximum speed of 180 wpm. Now that the use of (semi)automatic subtitles is becoming a possibility in live subtitling, it is all the more important to maintain these parameters in order to ensure that these new developments are not introduced at the expense of quality, that is to say, at the expense of viewers.
Acknowledgement
This research has been partly funded by the research group Transmedia Catalonia (a research group recognized by the Catalan Government with code number 2014GR027) and by the EU- funded project HBB4ALL, CIP- ICT- PSP.2013.5.1, ref number 621014 HBB4ALL.
Notes
1. In automatic subtitling, a speaker- independent speech recognition engine (commonly known as ASR or automatic speech recognition) transcribes what the speaker is saying without the need for a respeaker to act as an intermedi-ary. This transcription may be shown directly as subtitles on screen with no delay ( re- synchronization) and no correction, as was the case in a pilot pro-ject conducted by the Portuguese television company RTP, or by means of an operator who edits the transcription live (reviewing possible misrecognitions, errors of punctuation and character identification) before launching the sub-titles on air with a slight delay, as is done by the Japanese broadcaster NHK.
2. Most live subtitles in the US are produced by stenography, with very little editing, which allows for a completely automatic comparison between the source text (spoken dialogue) and the target text (respoken subtitles).
3. According to Chafe (1985: 106), idea units are ‘units of intonational and semantic closure’. They can be identified because they are spoken with a single coherent intonation contour, preceded and followed by some kind of hesitation, made up of one verb phrase along with whatever noun, preposi-tional or adverb phrase is appropriate, and usually consist of seven words and take about two seconds to produce.
4. This is carried out in Belgium by the Flemish public broadcaster VRT, which has managed to implement an antenna delay of up to ten minutes for some live programmes, chat shows for example. This enables subtitlers and respeak-ers to correct, edit and synchronize the subtitles before launching them on air for the viewers, who receive them as though they were produced live.
50 Pablo Romero-Fresco and Juan Martínez Pérez
5. Between 2014 and 2016, the BBC, ITV, Channel 4 and Sky, as well as their accessibility providers (at the time of writing, Red Bee Media and Deluxe), will use the NER model to assess the quality of one sample of their programmes every six months. This assessment will be reviewed by a team of researchers at the University of Roehampton, London.
6. The NERstar tool is available at www.nerstar.com.
References
AENOR. 2012. Subtitulado para personas sordas y personas con discapacidad auditiva.
Madrid: AENOR.
Apone, Tom, Marcia Brooks and Trisha O’Connell. 2010. Caption Accuracy Metrics Project. Caption Viewer Survey: Error Ranking of Real- time Captions in Live Television News Programs. Report published by the WGBH National Center for Accessible Media. http://ncam.wgbh.org/invent_build/analog/ caption- accuracy- metrics.
Chafe, Wallace. 1985. ‘Linguistic differences produced by differences between speaking and writing’. In David Olson, Nancy Torrance and Angela Hildyard (eds) Literacy, Language, and Learning: The Nature and Consequences of Reading and Writing (pp. 105– 22). Cambridge: Cambridge University Press.
Dumouchel, Pierre, Gilles Boulianne and Julie Brousseau. 2011. ‘Measures for quality of closed captioning’. In Adriana Şerban, Anna Matamala and Jean- Marc Lavaur (eds) Audiovisual Translation in Close- up: Practical and Theoretical Approaches (pp. 161– 72). Bern: Peter Lang.
Eugeni, Carlo. 2009. ‘Respeaking the BBC News. A strategic analysis of respeaking on the BBC’. The Sign Language Translator and Interpreter 3(1): 29– 68.
González Lago, María Dolores. 2011. Accuracy Analysis of Respoken Subtitles Broadcast by RTVE, the Spanish Public Television Channel. MA Dissertation.
London: Roehampton University.
Hefer, Esté. 2011. Reading Second Language Subtitles: A Case Study of Afrikaans Viewers Reading in Afrikaans and English. MA Dissertation. Vaal Triangle Campus of the North- West University.
Media Access Australia. 2014. Caption Quality: International Approaches to Standards and Measurements. Sydney: Media Access Australia. www.mediaaccess.
org.au/sites/default/files/files/ MAA_CaptionQuality- Whitepaper.pdf
Ofcom. 2013. Measuring the Quality of Live Subtitling: Statement. London: Ofcom.
http://stakeholders.ofcom.org.uk/binaries/consultations/subtitling/statement/
qos- statement.pdf.
Ofcom. 2014. Measuring Live Subtitling Quality: Results from the First Sampling Exercise.
London: Ofcom. http://stakeholders.ofcom.org.uk/binaries/consultations/
subtitling/statement/ sampling- report.pdf.
Romero- Fresco, Pablo. 2011. Subtitling through Speech Recognition: Respeaking.
Manchester: St Jerome.
Romero- Fresco, Pablo. 2012. ‘Quality in live subtitling: the reception of respo-ken subtitles in the UK’. In Aline Remael, Pilar Orero and Mary Carroll (eds) Audiovisual Translation and Media Accessibility at the Crossroads (pp. 111– 31).
Amsterdam: Rodopi.
51
4.1 Introduction
In recent years, a substantial increase in the demand for multimedia products has taken place, an increase that is being met by prerecorded or live multimedia programmes offered by broadcasters, IPTV or the Internet. At the same time, in the coming years, an increase is expected in the number of adults in Europe with problems accessing digital tele-vision, as has been highlighted by the DTV4All project (Looms 2009).
For this part of the population, subtitles are needed to access the audio content of TV programmes and to ensure the compliance of broadcast-ers with regulatory standards currently in place worldwide ( Romero- Fresco 2011).
Subtitles not only benefit hearing impaired people, but are also ben-eficial in noisy environments or places where the audio must be turned off. Non- native speakers with limited knowledge of a local language, or for whom accent or speed is a problem, may also find subtitles helpful.
These are some of the reasons why, at present, many films, TV series and prerecorded programmes are being produced with offline- generated subtitles. Live multimedia also requires subtitles, but real- time implies technical difficulties and lower quality than in the case of prerecorded subtitles. This has led to a number of research projects and technologi-cal innovations, some of which are looking into the synchronization of the audio/video with the subtitles, for example DTV4All project, APyCA system (Álvarez et al. 2010) and AudioToText Synchronization (García et al. 2009).
As will be explained, subtitling live events is a complex process, where the required immediacy limits the quality of the results in terms of con-tent, accuracy and delay. The subtitling process consists of a number of