Discussion - Large Scale Online Readability Assessment

This section contains a brief discussion on the results, the user-test, and the impli- cations of the results of this user-test. These discussion topics are subdivided by a number of future recommendations based on the user-test and its results.

1. More participants

The outcome of the test positively answered the central research question, showing a significant correlation between the results of the offline- and the online Cloze-tests. However a few reservations must be attached to this outcome. One of the more obvi- ous factors is the relatively small scope and number of participants in this user-test. User-testing with large groups of children is a complex task, both in an organizing

CHAPTER 3. STAGE A: DIGITAL CONVERSION ASSESSMENT 44

perspective and in the actual execution of the user-test itself. This is something that could have been done better.

2. Re-test on a more diverse group of participants

Another potential factor into the results of this study is that the children from this particular primary school were very similar socio-economically. With a vast majority of them being Caucasian, with presumably (due to the location of the school (central Amsterdam) and comments from colleagues) parents who are highly educated and financially well off. This might therefore not be reflective of the ’average’ Dutch classroom.

3. Involve experts in the process

Any (similar) future user-test would certainly have us communicate more with the teacher beforehand about the nature of the user-test and our goals and require- ments. Having done this for the second round of user-testing relieved so many potential problems which could have been avoided in the initial user-test group. We would also start a lot earlier in contacting teachers/schools and asking them to per- form user-tests on one or more of their classes. This would also potentially alleviate scheduling issues and delays e.g. not being able to do it in a long time period as a result of school vacations and/or exams.

4. Test other variants

The result of this user-test supports the theory that the online Cloze-test correlates with the offline Cloze-test in that they both measure similar outcomes. This would also imply that since the (original) paper Cloze-test is a verified method of readability assessment, this is now also the case for the online Cloze-test. However, some questions remain unanswered. Is this only true for this specific variant of the Cloze- test or do we need to test for correlation for every single variant? And can we make changes to the interaction scheme or visual presentation without compromising the legitimacy of the readability assessment?

Stage B: Cloze Automatization

In the previous chapter we showed there is a significant correlation between the original offline Cloze-test and the online version. This finding now provides us with a scientific basis to use the Cloze-test in an online environment to determine the readability of a text given the users’ performance of the Cloze-test. However, when we start to think of potentially doing these tasks in a large scale (online) environment, certain issues become apparent. Converting regular texts into Cloze-tests is a time consuming process. Going large scale would require an undetermined amount of texts to be converted for use in a Cloze-test. This cannot be done manually, which requires the process to be automated.

However, this automatization process does create additional problems which will have to be solved in order to have a fully automated Cloze-text extractor. Since this concerns a Cloze-test which does not use a fixed-rate word interval scheme, e.g. remove every other fifth word from the text, but a test designed according to the principle of rational deletion, see Sections 2.1 and 3.3.1. Using a rational approach allows the Cloze-test to measure understanding of the Cloze-test instead of grammatical knowledge. Research in this chapter on the automatization process of Cloze-tests uses the same guidelines1 _{from Kraf, Lentz & Pander Maat [13] as}

was done in Chapter 3.

The work in this chapter measures how difficult it is to design and develop a working automated system which can adhere to the aforementioned guidelines as well as a human can. This is due to the nature of the task, which comes down to understanding text at a general- and sentence level and omitting words which are ’important’ and suitable for extraction. Due to these challenges, this stage of the research aims to construct a system which can apply the guidelines as best as possible and approximate the performance of a human constructing a Cloze-test from the same source material.

This chapter will detail several methods and principles which all contribute in

1_{For the complete list of guidelines, see page 22.}

CHAPTER 4. STAGE B: CLOZE AUTOMATIZATION 46

tackling this issue. Various automatization methods and their performances are examined, analysed and compared. Concluding this chapter is a small Turing-test style user-test measuring differences between manually- and automatically created Cloze-tests.

In document Large Scale Online Readability Assessment (Page 51-54)