Reflections - Research design - Design and Implementation

Chapter 5 Design and Implementation

5.1 Research design

5.2.5 Reflections

The hypotheses were not all supported by the results, but one of the reasons could be the small number in the sample. Early indications were that the use of raw MT subtitles does not negatively affect comprehension, attitude or cognitive processing when compared with the presentation of full PE subtitles. However, we also recall here that the main objective of the pilot experiment was to test the methodology. As for the main experiment, more than 60 participants would be recruited, and ANOVA (analysis of variance) and t-tests would be used for statistical analysis in order to show the likelihood that the results occurred by chance.

Regarding the post-task questionnaire, Participant 01 commented that he could answer some questions in part one without watching the video, which indicated that the questions needed to be modified, to make it more likely that participants’ answers to comprehension questions would be based on information they had managed to glean (or not) from the video subtitles, rather than on their prior knowledge. Participant 01’s comment is also a reminder that a high comprehension score does not necessarily mean that subtitles are of good quality. The converse is also true: a low comprehension score does not necessarily mean that the subtitles are of poor quality; after watching a MOOC

video, viewers might not remember all the content well, and they might not be focusing on the video all the time when they watched it. Put briefly, memory, concentration, and even the IQ of the participant, could all be possible independent variables that had an effect on the results of part one. While the latter problem is difficult to solve, two possible adjustments to the research design were considered to deal with the first problem: either change the MOOC video to a more technical one; or refine the questionnaire with more appropriate questions. The second option was taken, that is, the post-task questionnaire would be altered to make it less possible to answer the questions using common sense.

As for part two, the attitude survey, again, a possible reason for the unexpected results could be that the quality of raw MT subtitles and full PE subtitles was not significantly different, since the HTER was only 19.69%. According to the pre-task questionnaire, generally speaking, all participants had a positive attitude towards MT. The BLEU score for raw MT subtitles was good. Hence, there is a reason to believe that participants’ attitudes might not have changed much after watching the video. It is worth emphasizing that the pilot tested the technology and the questionnaire, and that it was run in only two conditions, as this was all the researcher needed to do such testing. The results of the pilot were never going to be of interest given the sample size.

It was decided to include human translated subtitles for the video as an independent variable to determine whether there are differences between machine translated and human translated content. The human translated subtitles were the ones that were used as reference translation for calculating the BLEU score in Section 5.2.2. In the pilot

study, it was found that the segmentation of subtitles might be a confounding variable. Hence, the subtitles of the three types were readjusted in order to better follow the Netflix guidelines. After that, the number of lines of subtitles was 135, 138 and 141, for raw MT, PE and HT respectively. The difference in the number of subtitles is because for the same source sentence, MT, PE and HT can have different outputs, and one can be shorter/longer than another one. In order to follow the character limit (18 maximum), for instance, one line of the raw machine-translated version was cut into two lines, but the corresponding line of the post-edited version was not. See the example below:

Source:

And this seems to be even in individuals who are achieving physical activity recommendation.

Raw MT (18 characters in total, one line): 这似乎甚至在实现体育活动推荐的个人中 PE (21 characters in total, two lines):

甚至那些实现身体活动的人也好像有这样的情况

HT (22 characters in total, two lines): 并且甚至那些实现身体活动的人似乎也有这个情况

In addition, based on the results of the pilot, the researcher made some revisions (see below) to the first part of the post-task questionnaire, which is the comprehension testing part, while the attitude survey remained the same without edits; thus the post- task questionnaire used for the main experiment was the revised version (see Appendix E1).

Question 3 in the pilot was removed due to the fact that it was too easy to answer:

Which of the following would NOT be considered as a sedentary behaviour?

A. Driving in your car B. Sitting at a desk working C. Standing on the train to work D. Reading a book on the sofa

Two new questions were added to increase difficulty:

Which of the following benefit of physical activity is not mentioned in the video?

A. It can help reduce our risk of multiple diseases. B. It can help us to maintain healthy weight. C. It can help us to improve the quality of our life. D. None of the above.

Which of the following statement about physical activity is not right? A. It is any movement that uses energy.

B. It is not structured.

C. It is different from exercise. D. It is pursued for fitness benefits.

In document A reception study of machine translated subtitles for MOOCs (Page 157-160)