Materials - Source Content Experiments - Measuring acceptability of machine translated enterpri

Chapter 4 – Methodology

4.2 Source Content Experiments

4.2.2 Materials

The selected corpus for the usability, quality and satisfaction experiments consists of Online Help articles from a software company for one specific piece of software, i.e. a spreadsheet application. However, what exactly is meant by Online Help is open to interpretation. As Castilho and O’Brien (2016) show, labels for content types within the localisation industry are fuzzy at best.

The articles describe features of the spreadsheet application as well as instructions on how to use such features and are published on the company’s website.19 The choice of the content is motivated by several factors: i) the easy access users have to this content online which would allow for a wide-scale survey on satisfaction; ii) the willingness of the company to provide the content; ii) the theme of the content being, somehow, instructional which allows for creation of tasks that users can perform during the eye-tracking experiments.

For the satisfaction experiments performed via the web survey, 140 articles were selected and published online.

For the usability, quality and satisfaction (post-task satisfaction questionnaire) experiments, six articles were chosen and eight tasks were created. In total, the corpus consisted of 540 words. Originally, the articles published online contain images of the software such as buttons, icons, etc., however, as the goal of the experiment is to measure the usability of the text; some of the artwork was removed from the text. It was made sure that only the art that was complementing the text and not what was needed for understanding was removed. Three English native speakers were asked to test the texts with the art removed. In total, only three images were left in the text, two in task 3 and one in task 6. Each task is listed below:

1) Quickly change colors, fonts, and effects in your worksheet 2) Change the font format for hyperlinks

3) Format text in headers or footers 4) Add a comment

5) Apply conditional formatting with color 6) Insert an exploding pie chart

7) Insert a bar of pie chart

8) Hide comments and their indicators

Tasks 6 and 7 were created from the same article; therefore five articles were used to create six tasks. Tasks 4 and 8 were also created from the same article and were chosen because human translated versions were available in the target languages (DE, ZH and JP). As mentioned in Section 1.1.1, the HT versions were incorporated as two control tasks.

A short text about office suites was selected from Wikipedia and displayed for the participants before they started the tasks as a warm-up exercise. The text, which was displayed in English and contained 160 words, is also used for recording a reading baseline, that is, fixation count and duration would be recorded.20

4.2.2.2Tools

This section describes the tools used for each experiment in this research project. The tools used for the English Language experiments for usability, quality and satisfaction are described. It is important to note that, as some of these tools will be used for all the languages evaluated in this research (EN, DE, ZH and JP), the tools are described in detail when the term first appears and then referred to in subsequent sections.

4.2.2.2.1Spreadsheet Software

In collaboration with the industry partner, a spreadsheet application from the office suite to be used as the software for the usability experiment was selected. The application includes calculation and graphing tools and is extensively used to carry out data manipulations. The choice of this application is due to the fact that, as the office suite has more than 1.2 billion users, an application in which participants would be literate but not total experts needed to be chosen. For that, it was also decided to use the newest version of the software, 2013, as it was assumed fewer people would have used that version.

4.2.2.2.2Eye Tracker Device

The device used in this experiment is a Tobii T60XL, a wide-screen eye tracker - 24 inch monitor- with a 60Hz sampling rate. It has high screen resolution, allowing for studies of detailed stimuli21, which is essential to this experiment since the participants need to have a clear view of all the spread sheet features. The fixation filter used is the ClearView Fixation Filter, set to 100 milliseconds for the fixation duration and 30 pixels/sample for the fixation radius. As the experiment contains text and pictures (the user interface – UI), the setup for a mixed content stimuli was chosen (see Figure 4:2 for screen layout).

4.2.2.2.3Source Content Profiler

Source Content Profiler (SCP) is a tool developed by the CNGL/ADAPT research group at Dublin City University. The tool allows for the classification of documents into various profiles by making use of a language model trained on the National British Corpus (NBC), and a domain classifier. When a text is uploaded, the tool displays an overall score (SCP score) which measures the quality of an input document - on a scale from 0 to 100, where the higher the score the higher the quality of the document – and allows for the identification of the amount of issues in the content. It then breaks down those issues into shallow features, such as:

80  Word and sentence length and number

 Syntactic structure including grammar issues, number of sentences with unusual POS sequences and passive voice issues

 Spelling issues

 Terminology used

 Domain detection

The objective of using the SCP was to better understand the features of the selected content, as well as its level of difficulty.

4.2.2.2.4Coh-Metrix

Coh-Metrix is a computational tool that measures cohesion and coherence for written and spoken texts (Graesser et al. 2004). Coh-Metrix analyses texts on over 200 measures of language and readability, and over 50 types of cohesion relations by using lexicons, part-of-speech classifiers, syntactic parsers, templates, corpora, latent semantic analysis, and other components. Coh-Metrix has been used to identify differences between spoken discourse and written text, differences between writing styles (McCarthy et al. 2006), as well as to predict the difficulty of reading texts for second language learners (Crossley, Greenfiel and McNamara 2008). Therefore, Coh-Metrix has been proven to be a powerful text analysis tool that is capable of assessing different content types. The main objective of using this tool is to identify the level of comprehension difficulty of the corpus used for the experiments and, consequently, identify whether problems with the source content (if any) may influence the acceptability of the translated content.

4.2.2.2.5Web Survey

A web survey displayed on the industry partner’s website for 140 articles (EN, DE, ZH and JP) gathered information on ‘how useful’ the content is for the end user. The online survey consisted of only one multiple choice question: “Was this

information helpful?” (YES/NO). Unfortunately, the survey question could not be

website. One important point to be mentioned here is the implications of collaborating with companies for academic research. While the lack of control over certain parts of research may be a drawback - such as the availability of content types or the phrasing of web survey questions and other legal matters, the benefits that come with the collaboration – such as great amount of content when a type is agreed on, professional translation and post-editing and the end user ratings for known software making the research closer to the real world problem - outweigh those drawbacks. While, of course, a more detailed survey would be desirable, evaluating the 140 articles by this metric provides an initial indication of satisfaction levels that are complemented with the eye-tracking experiments.

4.2.2.2.6Post-task Satisfaction Questionnaire

A post-task questionnaire in English was presented to participants after the performance of the usability experiments, and consists of nine questions with a Likert scale ranging from 1 (Strongly Disagree) to 5 (Strongly Agree). For all statements, except numbers 5 and 8, the higher score (5) indicates higher satisfaction (the opposite is true for statements 5 and 8).

1. The instructions were usable.

2.The instructions were comprehensible.

3. The instructions allowed me to complete all of the necessary tasks

4.I was satisfied with the instructions provided.

5.The instructions could be improved upon.

6.I would consult these instructions again in the future

7.I would be able to use the software again in the future without re-reading the instructions.

8. I would rather have seen the source (English) version of the instructions

English native speakers did not see question 8 since they were already using the original version of the instructions.

In document Measuring acceptability of machine translated enterprise content (Page 94-99)