Think aloud - Methodology and Method - Exploring the experience of liminality in learners of se

Chapter 3: Methodology and Method

3.5.2 Think aloud

I decided to use a think aloud technique to collect data. Land and Meyer (2010, 65) had suggested the use of ‘talk aloud’ methods; Green (1998, 5) distinguishes between talk aloud and think aloud by saying that think aloud but not talk aloud expects individuals to verbalise info that is not ‘already encoded in verbal form’.

My participants were told to ‘think aloud’ while they tried to solve the problem I was giving them; their utterances were recorded using a digital recorder. During the initial part of each interview I presented the participants with the stimuli; we moved to the next when I judged that they are solved the problem they were tackling, or if they were unable to solve it that they had no further thoughts to utter. During this part I restricted my input to prompting them to ‘please keep talking’ if they had paused a long time3_{, ‘OK?’ if I thought they had finished, and to informing them that I was not} going to give prompts if they seemed to be seeking assistance. However, during the latter part of the interviews I sometimes asked them to clarify what they had said or prompted them if they appeared irredeemably stuck. Finally during the feedback

3_{It is difficult to be precise about how long a ‘long time’ was. It depended on the individual and their}

typical pattern of pausing. The point is that I used my judgement to decide when to prompt or intervene.

92 stage the interview moved into more of a dialogue where I explored the ideas they had advanced. With some participants I also explored their feelings about their experience.

As determined by the purpose of the investigation, this was a concurrent think aloud technique. It is possible to have participants talk about an experience retrospectively but Green (1998, 6) points out that ‘retrieval is of course fallible’ and some

information may be filtered or tidied up; Van Someren et al. (1994, 21) suggest it may be subject to ‘post-hoc rationalizing.’

There are potential problems to using what people say as data. We know that what participants say is not the same as what they are thinking (Green 1998, 4).

Participants may misreport their thinking for a variety of reasons including deference and social desirability (Bernard and Ryan 2010, 33 - 37) which may have been a particular problem in the present study given that I was known to the participants as an ex-physics teacher; one of the participants reported that they expected that they would be told the ‘correct’ answer at the end of the interview. These considerations mean that as Dennett (1991, 78) says, ‘we can't be sure that the speech acts we observe express real beliefs about actual experiences; perhaps they express only apparent beliefs about nonexistent experiences.’ However, as Smith (1995, 10) notes, there is likely to be some sort of relationship between what someone says and what they think: ‘What a respondent says in an interview has some significance for him or her’.

93 One of my concerns about using a think aloud technique was that talking about one’s thinking concurrently with doing the thinking might disrupt or distort the thinking in some way. This has been considered in the literature. Ericsson and Simon (1984, xiii - xiv) report that think aloud tasks do not change the sequence of thoughts and Van Someren et al. (1994, 32 - 33) found little evidence of disruption above the

‘inevitable’ distortion that comes with being studied, though they were aware that verbalization imposed extra pressures on memory. Although the solutions to the problem posed were not in the memory for most participants, they may have found it more difficult to access their memory of other areas of physics which they might have used to develop solutions. This might have increased the likelihood of stuckness.

Data was collected by recording interviews between myself and a participant. The first interviews relied on audio recording. The pattern for the first interviews was to give the participant a challenge in physics and then to ask the participant to think aloud whilst they attempted to solve it. At the start the participant was unprompted, except to be encouraged to vocalise during excessively long pauses. Later during the interview I would intervene, perhaps to check that the participant had finished their thought, perhaps to focus the participant more clearly on the challenge, perhaps to move the participant towards a solution. The aim was to let the participant struggle but not to let them give up. To this end the interviews were, apart from the initial stimuli, unstructured: the timing, the manner and the content of my interventions were highly variable, designed to explore what, at the time of the interview, I felt were interesting aspects of the liminality.

94 Where necessary, the participants were asked to clarify certain issues after the interviews had been transcribed. For example, f2 used the word ‘weird’ a number of times during one of her interviews and was subsequently asked to clarify what the word meant for her.

At the end of each interview, I explained to the participant solutions to the challenges posed. The participant was then asked whether they would take part in further

interviews; all but one agreed. This suggested that whatever the participants

experienced it was not sufficiently unpleasant to deter further participation. However, the second set of interviews was found to be inconsistent at provoking liminal

experiences so the data was not used in the analysis.

3.5.3: The ‘dog on the bed’ challenge

The ‘dog on the bed’ challenge was chosen because it explores the understanding of Newton’s Third Law of Motion. Newton’s Laws are cited as examples of troublesome knowledge in Meyer and Land (2003). They are also regarded as potential drivers of conceptual change by, for example, Driver et al. (1985b, 195). Those studying the teaching of physics (for example, Poutot and Blandin 2015) also regard Newton’s Laws as problematic.

The participants were shown a series of captioned pictures (these appear in

Appendix 1). The first picture showed a dog on a bed with a caption: ‘The dog has a weight of 100 N. What forces act on him when he lies on the bed?’. Once they had

95 considered this and responded they were shown a picture of a simple force diagram showing two forces: the weight of the dog (100 N) acting down and the reaction force from the bed of 100N acting vertically upwards. This usually agreed with their

answer; it helped to standardise their thoughts.

They were then shown a second picture showing a (stuffed toy) meerkat lying on the bed with the caption: ‘The dog jumps off the bed and is replaced with a 5 N meerkat. What forces act on the meerkat?’. Following consideration of this stimulus most respondents quickly and correctly responded along the lines of the next picture which showed a force diagram where the downwards weight of the meerkat (5 N) is balanced by the 5N upward reaction force from the bed.

The respondents were then shown a final stimulus which showed the dog on the bed again with the caption ‘How does the bed adjust its reaction force so that it always balances the weight of the thing lying on it?’ This was the stimulus that was designed to put respondents into a state of liminality because most physics students are well aware that the forces must balance (as prescribed by Newton’s Third Law of Motion) but they are also aware that a bed is not conscious and that therefore it cannot ‘know’ how heavy the thing lying on it is in order to ‘choose’ the appropriate reaction force. There are a variety of correct answers to this apparent paradox. One might be that as the object (dog, meerkat, person etc) gets on the bed the bed springs are squashed; the heavier the object the more they squash; the more they are

96 Respondents who, with or without help, achieved some degree of solution were given another problem: how does the floor adjust its reaction force to match the weight of the person standing on it? The lack of obvious springs in the floor meant that most respondents re-entered the liminal experience; they were unable to

generalise from the springs to, for example, the bonds between the atoms in the floor (which behave remarkably like springs).

Participants were asked to think aloud as their attempts to explain these challenges were recorded. I intervened where necessary.

3.5.4: Sampling

Like other interpretive methodologies, grounded theory emphasises the importance of context. This may mean that it is difficult to apply the results to other contexts (it has low external validity).

I wanted to investigate those learning physics shortly prior to university. This choice was because I had been involved for many years teaching physics at this level and I concluded that I had observed learners undergoing liminal experiences many times.

I chose to study learners aged between 16 and 18 studying the first year of A-level physics in a sixth form college. The choice of college meant that my participants were from a wide variety of backgrounds (rather than, for example, all attending a fee-paying school). In addition, the learners were all studying with the same teacher.

97 None of them had encountered this teacher prior to the start of the academic year. I observed a number of lessons prior to the interviews to gather contextual information about the background understanding of the learners prior to the interviews. Thus I ensured that all participants had been taught Newton’s Third Law prior to taking part in the interview.

Grounded theory methodology uses a sampling strategy called theoretical sampling; Denscombe (2007, 95) makes the point that this is a type of purposive sampling. Theoretical sampling, in which the emerging theory is used to control the data collection (Glaser 1978, 6; Charmaz 1996, 31), requires that at least one category has been developed (Charmaz 2014; 199, 204) but can then be started. New participants, or new contexts, or new questions, etc are selected (Noerager Stern 2007, 116; Charmaz 2014, 192). Bernard and Ryan (2010, 369) characterise theoretical sampling as following suggestions in the evidence as theories develop. This creates a potential bias in that one’s early half-formed theories guide the collection of later evidence. However, Morse (2007, 231 - 234) defends this

technique as a way of gaining adequate data about less common categories without being swamped by data in the more common categories; however, she

acknowledges that this creates an inherent bias.

Having defined the context I called for volunteers. Ten from a potential 24of the first cohort volunteered; most of whom were used in the first round of interviews (for some there was a difficulty in arranging the interviews; one withdrew from the physics course before his interview was conducted). Theoretical sampling implies that the analysis of the initial data guides subsequent data collection. This was done

98 in the second round of interviews which were conducted on the parallel classes twelve months later.

Although random sampling techniques might be the most effective way to control bias, Krippendorff (2013, 115) makes the point that probability sampling techniques only make sense when each sampling unit contains an equal amount of information which was unlikely in this investigation. Given that, in theory if not in practice,

grounded theory data collection starts without preconceptions, we cannot sample randomly because we do not have a priori variables on which to randomise. Morse (2007, 231) recommends using convenience sampling with articulate, expert

informants although this in itself begs the question of how we know who the experts are if we have no preconceived ideas.

Hood (2007, 153) suggests that the ‘General Inductive Qualitative Method’ uses purposeful sampling and that ‘data collection stops when additional cases no longer add new information’. Bernard and Ryan (2010, 360 - 361) suggest that this will be after between 10 to 20 informants depending upon their expertise (fewer informants are needed if they are more knowledgeable); I interviewed 19 respondents over two tranches, seven in the first and twelve in the second. The grounded theory method continues collecting data until ‘theoretical saturation’ is reached; this is when you hear ‘nothing new’ (Stern 2007, 117); when ‘no additional data are being found’ (Glaser and Strauss 1999, 1967, 61); when ‘fresh data no longer sparks new theoretical insights’ (Charmaz 2014, 213). Determining this point is a particular concern in the design of this study because although the context of pre-university physics learning is well-bounded the concept of liminality is not well-bounded. This

99 means that it may be difficult to decide when no further information will be

forthcoming; although my second tranche of respondents delivered few new coding categories how could I decide whether a third tranche might not be needed?

However, this was always designed to be an exploratory study. This is therefore ‘semi-permeable’ theoretical saturation in which the codes and categories developed are considered robust, but the possibility is left open that there may be codes or categories undiscovered. For example, the strategies that have been listed are well- evidenced but there is no claim that other respondents might not use other

strategies.

There are several potential sources of bias due to sampling which I acknowledged as part of my design.

• Because I was studying physics there were significantly more male

respondents than females (16 compared to 3). This reflected the proportions within the groups I studied.

• My use of volunteers meant that my respondents were likely to be more self- confident as learners. This agreed with my observations of how my

participants behaved during the informal observations of their lessons that I conducted in the weeks immediately prior to the interviews. This might have had a particular effect when I analysed the affective nature of the responses such as the fact that I found no negative motivational issues with this cohort.

100

3.5.5: Microanalysis

Since I was using the grounded theory method, the data collection and the data analysis overlapped (Urquhart 2013, 8). Although there was initially a short phase in which data was collected alone, after a few interviews the process of transcribing and analysing the data began.

The first phase of data analysis involves breaking up the data into conceptual components (Glaser 1978, 55; Bernard and Ryan 2010, 271; Strauss and Corbin 1998, 102; Charmaz 2014, 113). Starting with a single transcript (Smith 1995, 19), the verbal protocols were microanalysed (Strauss and Corbin 1998, 58), that is, the transcripts were segmented into phrases, clauses or sentences, ‘each segment corresponding to a chunk of behaviour such as a statement or a phrase’ (Green 1998, 19). Most of these segments lasted a few seconds. An example of a microanalysis is shown below (Table 1).

After three transcripts had been microanalysed, their characteristics were examined, at first individually and later in comparison with one another. This examination was written up in ‘primary memos’. These were ‘partial, preliminary, and provisional’ (Charmaz 2014, 181); they were treated as transient so that the memo for one participant’s transcript was revisited and added to where comparison with another microanalysis suggested this should be necessary. By comparing the three

transcripts I was able to design a provisional template for characteristics to look for; this template was also provisional and I added to it as I studied more microanalyses.

101

TIME TRANSCRIPT COMMENT

2m39.0s

But it’s a lot quicker.

2m40.7s

That’s my answer Laughs

2m42.3s

OK and how does the bed adjust ... from dog to meerkat? Experimenter intervention 2m48.0s Pause 2m50.6s Don’t know 2m50.9s Pause

Table 1: An example of a microanalysis

The microanalyses were then coded, line by line or word by word (Charmaz 2014, 124) by myself (as recommended by Glaser 1978, 58). The purpose of the coding was to ‘select, separate and sort data’ (Charmaz 2014, 111). Some segments were coded with multiple codes. In grounded theory, the ideal is that the codes should ‘emerge from data’ (Hood 2007, 154). In vivo codes (Kelle 2007, 199) were used where appropriate to reduce bias caused by a priori theories although the initial

102 decision to use only in vivo codes was soon found to be impractical because

respondents expressed very similar ideas in different ways and because some codes were needed to describe what a respondent was doing rather than what they were saying (for example ‘explaining’). After coding the first three transcripts, each new coding was compared with the previous codes used. When a new code emerged the already coded transcripts were revisited to see whether the new codes were

applicable to them. This constant comparison method (Holton 2007, 278; Taber 2000, 471; Kelle 2007, 194) was also used to modify and merge codes where appropriate. Some codes were differentiated: for example it was recognised that ‘yes’ tended to mean something different from ‘yeah’, that repetitions could be associated either with the beginning or the end of liminality, and that there was evidence for two forms of stuckness.

In microanalysis the transcript was fragmented into chunks and timings were added. The timings were approximate: by retiming a small sample it was established that the timings were almost always repeatable within 0.2s; this was regarded as a

reasonable estimate of the accuracy of the timings given. The third column was used for comments (for example, ‘laughs’, ‘whispers’ etc) and contextual information.

3.5.6: Pauses cause the method to be revised

In accordance with standard grounded theory procedure there was no pilot study (although Nunes et al. 2010 have suggested that the use of a pilot study within a grounded theory enquiry offered ‘a more articulated view on the internal structure of

103 wider phenomena’). Instead, data analysis was reviewed after the first three

interviews. This led to a significant modification of the method.

At the outset I had resolved to limit my coding to the verbal responses of the

participants, even to the extent of using only in vivo codes in the substantive coding. However, it became clear during the transcription of the first interview that

paralinguistic features of the utterances were important. Such features as pauses, whispering, repetitions and verbal fillers such as ‘yeah’ provided an alternative set of data which was based on how the participant verbalised thoughts rather than what they said. This offered another way of interpreting the data allowing the possibility of a limited form of triangulation. It also provided a relatively objective way of viewing the data so reducing the effect of experimenter bias.

This had not been expected. The examples of a political speech and an interview given by Urquhart (2013) in her practical guide to grounded theory and an interview coded by Charmaz (2014, 145) show no examples of non-lexical items such as hesitations. This may be an artefact of the transcribing process. On the other hand Charmaz (1996, 33) recommends that data should include context, verbal and non- verbal cues. Pauses, as well as repetitions, changes in the speed of talking and changes in voice are among the features of think aloud listed by Bernard and Ryan (2010, 56 - 63). My transcripts were full of hesitations and pauses, some filled, and repetitions and other lexical and non-lexical items. This was a potential source of evidence that I was unwilling to ignore.

104 My earliest transcripts suggested that pauses were associated with statements such as ‘I don’t know’ and that cognitive challenges tended to provoke long pauses.

On the other hand, it was recognised that pauses have a variety of causes. For example, some pauses represent time taken to breathe. However, Green (1998, 17) found that ‘individuals often fall silent when encountering task difficulties’. A literature search revealed little data on the length of a typical pause (see below) so I decided that pauses shorter than 0.2s would be represented by punctuation (for example, commas, semi-colons and periods). This was because my timing procedures were

In document Exploring the experience of liminality in learners of secondary school physics (Page 103-126)