INSTRUMENTATION USED IN THIS STUDY - CHAPTER 03: METHODOLOGY

3. CHAPTER 03: METHODOLOGY

3.5. INSTRUMENTATION USED IN THIS STUDY

The NASA TLX was developed over a number of years by research team of Hart and Staveland (1988) for the American National Aeronautical and Space Administration. This tool is a multi-dimensional subjective ratings scale that requires the participant to indicate the perceived load placed upon them from six aspects of a learning activity, these are described below.

DIMENSIONS OF THE NASA TLX Mental Demand

How mentally demanding was the learning task?

How much mental and perceptual activity was required (e.g. thinking, deciding, calculating, remembering, looking, searching, etc.) Was the task easy or demanding, simple or complex, exacting or forgiving?

Physical Demand

How physically demanding was the learning task?

How much physical activity was required? (e.g. pushing, pulling, turning, controlling, activating, etc.) Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious?

Temporal Demand

How much time pressure did you feel due to the rate of pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic? Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious?

Performance

How successful were you in accomplishing what you were supposed to do? How successful do you think you were in accomplishing the goals of the task set by the

experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals?

Effort

How hard did you have to work to accomplish your level of performance?

How hard did you have to work mentally and physically to accomplish your level of performance?

Frustration

How insecure, discouraged, irritated, stressed and annoyed versus secure, gratified, content, relaxed and complacent did you feel during the task?

In their original design for the NASA TLX, Hart and Staveland (1988) recommended that each of these subscales should have a wide range of increments. Their rationale being, that scales having fewer increments demonstrate lower sensitivity to experimental manipulations. The original design used a visual scale with an unmarked continuum that could be partitioned into user definable intervals at the time of data analysis (0-100 in their initial validation of the scale). Many modern versions of the test have a scale partitioned from 0-20 and some researchers, such as Haapalainen, Kim and Dey (2010), have used scales with fewer increments (1-5). Hart and Staveland explain that having discrete partitions allow testing when the participant is required to verbally state levels of task load using whole numbers, and fewer partitions make the data easier to sort. For this study, it was not necessary for the participants to verbalise levels of load, so a graphic scale was used. The scale featured 20 marked partitions, but it was explained to the participants that a crossline could be placed at any point along the scale. The width of the scale (16 cm) provided enough space to assign a score between 0 and 100 offering high sensitivity.

Each dimension was rated using this 100-point subjective rating scale. The score can be used raw or can be weighted. Weightings adjust the raw score by the task load

dimensions identified as having had the most influence on the activity under

investigation. Weighted scores are obtained by the use of a paired-comparison task whereby the participant is required to choose which dimension was more relevant to the

learning task from 15 sets of pairs in which every dimension is paired in combination with every other dimension. A weighted score can then be calculated by multiplying the raw score for each dimension by the weighted score and then dividing by 15 (as there are 15 choices of pairs).

The weighting technique is useful in corroborating the data obtained from the subjective rating scales, for example in this study one control group participant indicated a score of 92/100 for the physical demands placed upon them by studying a labelled photograph. This seems an unreasonably high score for a passive activity, given that the mean score from the rest of the group was 16/100. However, when the same participant completed the paired-comparison task, the weighting given to physical demand was zero (i.e. they did not choose physical demand from any of the pair-wise choices and therefore considered it to be the least influential of all the dimensions relating to task load). The weighted result for physical effort gives a more realistic impression of the user-

experience than the raw data in this instance. This is in agreement with Hart (2006) who states that weighting increases user sensitivity to variables and also increases inter-rater reliability. For these reasons, the weighted scoring method was used in this study.

3.5.2. Rationale for Using NASA TLX in this Study

The rationale for using the NASA TLX in the context of this study is based on the following features:

• Subjective measurement scales have been demonstrated to be sensitive to small differences in cognitive load and are valid, reliable and unobtrusive (Paas, et al., 2003). Subjective measures of cognitive load have also been found to correlate highly with objective measures (Paas and van Merrienboer, 1993; Kaluga, Chandler and Sweller, 1998).

•_{It is considered to be more reliable than other workload measures, to have the} highest factor validity (namely the highest correlation with the factor it was intended to measure) and a high test-retest reliability (Noyes, Garland and Roberts, 2004).

•_{It is a widely used and well-validated technique in assessing the workload in} operators of “human-machine” systems such as mobile technology (Hart, 2006).

• It is now considered a foundation of cognitive load measurement and as such, is often used as a benchmark against which other methods are judged (Hart, 2006), • It has been identified as the most commonly used tool for the assessment of

cognitive load due to its ease of implementation and low intrusiveness to the activity being assessed (Haapalainen, Kim and Dey, 2010).

• It is safe and non-invasive for the participants.

• It provides numerical data that can be used with statistical tools required for hypothesis testing.

•_{It measures additional task load dimensions in addition to cognitive load.}

3.5.3. Pre and Post-Testing

A well-accepted method of measuring cognitive load objectively is to use performance outcome measures (Brünken, Plaas and Leutner, 2003). These can include pre and post- testing of knowledge following a learning activity. For test validity, it is important that any additional causes of extraneous cognitive load are either removed from the learning tasks or are present in equal measure in each of the two learning tasks. If additional causes of extraneous cognitive load are present and in unequal measure between the two learning tasks, the results cannot be assumed to be solely due to the variable being measured (i.e. task load imposed by the mode of content delivery). Inferring cognitive load from

performance outcome measures, relies on the fact that the information content of the learning materials is the same for both the control group and the experimental group. If the content is identical for both groups, the intrinsic cognitive load can be assumed to be the same for each group. Any difference in post-test scores may be assumed to correlate with extraneous cognitive load induced by the mode of delivery.

Rationale for Choice of Test Topic

The pre-test topic (structures of the base of skull) was chosen because it would be relevant to the target group, requires rote learning and is sufficiently specialised enough to make it unlikely that the learners would have previous knowledge about this area. To assess normal levels of pre-existing knowledge about the structures of the base of skull, the test was trialled in a pilot study on learners representative of the target group. The

mean score achieved was 1.1 out of a possible 36 marks, confirming that it would be unlikely that participants would have a high level of previous knowledge about this anatomical region. The reason that a lack of previous knowledge must be ascertained is twofold, firstly if the participants already understand the topic, there may be little

measurable difference in the pre and post-test scores. Secondly, the learning activity may cause extraneous cognitive load due to the expertise-reversal effect as described in section 2.2.2.

3.5.4. Rationale for Using Pre/Post-Testing in This Study

• Before the introduction of the NASA TLX, pre/post-testing was the most common method of investigating cognitive load (Brünken, Plaas and Leutner, 2003).

• Pre/post-testing will provide objective data which may triangulate with the findings of the subjective NASA TLX tool.

• It is ideally suited to a study that compares two variants of multimedia instruction of the same material because the intrinsic load induced will be equal in both variants. If one group of learners acquire more knowledge than the other, it is likely to be due to a comparatively lower level of extraneous cognitive load experienced during the task (Brünken, Plass and Leutner, 2003).

In document An empirical study to assess the impact of mobile touch-screen learning on user information load (Page 131-136)