Pilot 3: Duration and difficulty of measures

In document Advanced mathematics and deductive reasoning skills: testing the Theory of Formal Discipline (Page 108-112)

5.2 Pilot Studies

5.2.3 Pilot 3: Duration and difficulty of measures

There were several measures selected for the longitudinal study reported below for which the average completion time was unknown. In order to plan an appro- priate time slot(s) with schools, a pilot study was conducted to determine the total session duration required. Five undergraduate mathematics students were recruited to complete the entire set of tasks (demographics, the RAPM subset, the Conditional Inference Task, the Belief Bias Syllogisms task, the CRT, the Need for Cognition scale, and a mathematics task). The aim was to record the duration of each individual measure and the overall duration of the test for each participant, so that an average duration could be derived. A further aim was to assess whether the measures being used were of an appropriate difficulty and whether the instructions for each measure were clear to participants unfamiliar with the tasks.


Participants The participants were five undergraduate mathematics students (one male, four females), aged 19 to 51 (M =25.80, SD =14.10) who took part in return for £15 each. Recruitment was through an email advertisement to

the students on a differential equations module for first and second year under- graduates. This sample was assumed to be of a higher general ability than the AS level students of the longitudinal study, although not greatly. Presumably it tends to be the most able AS level students who go on to degree level study, so the undergraduate sample here may be similar to the more capable students in the AS level sample. The implication of this is that the scores found here may be slightly higher than in the AS level sample, and so any floor effects found should be of particular concern.

Procedure. Participants were informed that they would be given a test book to complete. They were told that the aim of the study was to determine the length of the test, which would be used in a large scale study in the future. They were then asked to sign a consent form before taking part. Two participants took part simultaneously but working alone, and three took part individually. All testing took place in a quiet seminar room.

Participants were asked to work through the booklet at their own pace, informing the experimenter when they reached the end of a section and began the next section. The sections were presented in a set order for all participants. After they had completed the booklet, participants were asked whether any part of the test was unclear, too easy or too difficult, and whether they had any other comments. They were then thanked, paid and dismissed.


The first section of the results will deal with the length of the measures, the second section will deal with the range of scores obtained, and the final section will discuss the participants’ comments on the clarity of the tasks.

Duration. Table 5.1 shows the descriptive statistics for the length of time taken for each measure. The mean total duration was 45.40 minutes with a standard deviation of 11.63 minutes. However, the total duration data is posit- ively skewed (2.13) with four of the data points in the range 39-43 minutes and one data point of 66 minutes. Therefore, four of the five participants completed the test faster than the average time of 45.4 minutes.

The RAPM section of the test has a time limit of 15 minutes. However, it can be seen from the table that some participants finished more quickly than this.

Scores. Mean scores were examined to indicate whether any of the tasks suffered from floor or ceiling effects. Table 5.2 shows the descriptive statistics for the scores obtained on each measure as well as the theoretical minimum and maximum scores. The CRT scores were not examined because the participants had recently been exposed to the task as part of another study. As the table

Measure Mean SD Min Max

Demographics 2.2 1.64 1 5

Raven’s Matrices 14.0 1.73 11 15 Conditional Inference 10.6 3.13 8 16 Belief Bias Syllogisms 3.8 1.3 3 6 Cognitive Reflection Test 1.2 0.45 1 2 Need For Cognition 3.2 1.79 2 6

Mathematics 10.4 3.64 7 16

Total 45.4 11.63 39 66

Table 5.1: Duration information for each measure used in the test book (units are in minutes).

shows, none of the mean task scores were at, or approaching, the theoretical minimum or maximum scores, with the possible exception of the Belief Bias Syllogisms task approaching the theoretical maximum.

Additional comments. All five participants commented that the Conditional Inference Task was not completely clear on first reading of the instructions, al- though they did find that it became clear as they began to complete it (discussed below).

Discussion and implications

The results of the pilot test have shown that (a) the average duration of the entire test is 45 minutes, with the majority of participants finishing in less time, (b) the measures are at ceiling in some individuals but not on average, and (c) there was one issue noted by the participants; the instructions for the Conditional Inference Task.

The length of time taken means that the whole test can be completed within one school lesson, which are usually 50-60 minutes in length. Requiring only

Measure Mean(SD) Min Observed Max Observed Possible Min Possible Max Raven’s Matrices 11.40(3.78) 0 8 18 16

Conditionals 22.40(7.23) 0 15 32 32

Syllogisms 10.60(1.67) 0 8 12 12

NFC 5.47(1.09) 0 4.56 9 7.33

Mathematics 10.20(2.49) 0 7 15 12

Table 5.2: Descriptive statistics for each measure used in the test book (standard deviations in parentheses).

one session will reduce the demand on teachers’ and participants’ time. On the other hand, it is evident from the one participant who took 66 minutes that some individuals may take longer than one lesson’s worth of time, in which case they will either need to leave the test incomplete, stay after the lesson to finish, or continue at another time. This is something that can be decided with the input of the teachers concerned.

With reference to the second part of the results, the scores obtained were only problematic for the Belief Bias Syllogisms task, where there is a slight ceiling effect. It is worth re-emphasising that the participants in this pilot are educationally more advanced than the participants of the main study reported below. The main study will use participants from the beginning to the end of their AS year of study, whereas the participants here were at the end of their first or second year of undergraduate mathematics degrees, so it can be expected that they would perform higher on achievement tests than the majority of the main study participants will. Therefore, all of the measures piloted here are expected to provide enough variation to detect improvements over the course of an AS level.

It was noted by all participants that the Conditional Inference Task instruc- tions were not completely clear. However, none of the participants could suggest how the instructions might be clarified even once they had completed the test and reported that they did understand it. This may reflect the unavoidably complicated nature of the task, since it is not something that is usually en- countered in day-to-day life. The instructions used were adapted from Evans, Clibbens & Rood (1995), who did not report any similar issues in their large scale use of the measure. In the interest of consistency with published research, the instructions will be made identical to those used by Evans, Clibbens & Rood for the main study, and it is expected that even if the participants find the in- structions complicated in isolation, the task will become clear once they start. An experimenter will always be present when participants complete these tasks, so there will be the opportunity to ask for clarification if necessary.

In sum, the pilot study described here has not raised any problems that require the measures selected to be altered or substituted.



Three pilot studies have been presented that assessed various aspects of the measures selected for the main study, and each has provided positive results. Next, the main study itself is presented.

In document Advanced mathematics and deductive reasoning skills: testing the Theory of Formal Discipline (Page 108-112)