CHAPTER 6: METHODOLOGY OF SEQUENCE ANALYSIS
6.4 Finding the Pattern: Sequence Analysis Technique
The process of Sequence Analysis consists of several steps, which help transform raw coded data strings into statistical results which are then interpreted. This next section describes each of these methodological steps, with pauses along the way to examine the statistical reasoning behind these steps.
188
6.4.1 Finding Transitions: the AdTAT Program.
All the transitions between events in the coded data set need to be tallied. To find these transitions by hand would be time consuming and prone to human error. A more convenient way of doing this is to use a concordance program. A concordance program is a way of analysing text to find particular letters, words or phrases and the immediate context that they appear in. Concordances are often used in the study of linguistics to perform such activities as comparing how a word is put to different uses or word frequencies. A famous example of this is work done on the word frequencies of the late Iris Murdoch’s literary output. Analysis on word frequency revealed that as Murdoch entered the early stages of Alzheimer’s prior to diagnosis, her vocabulary became more limited as compared to her earlier works. This suggests that Alzheimer’s might have limited her creative output before she was showing any diagnostic
symptoms (Garrard, Maloney, Hodges & Patterson, 2005). Concordance programs can also reduce the complexity that comes from looking at large paper work by converting it into a readily searchable data format (Davidson, 1990).
The Adelaide Text Analysis Tool (AdTAT, 2007) was used to search for all the possible transitions that had occurred when the data were coded. This was done by searching for each of the codes in turn. This produced a list of the all the transitions in the data.
6.4.2 Transition Matrix and Standard Normal Residuals.
The results from the AdTat tool were entered into a transition matrix. The transition matrix produces a list of all the possible transitions which could occur and shows how often each of them did. Each cell in the matrix therefore represents a possible transition and how many time it actually happened. The transitional matrix represents transitions which are chronological in nature; that is, the events that go
189
down the left hand side of the matrix are antecedents and the events that go across the top of the matrix are sequiturs, as in Figure 6.1.
Figure 6.1
Example Transition Matrix
A B C Total
A 5 7 2 14
B 4 11 3 18
C 6 0 67 73
A Chi-Square calculation was run on the whole matrix. The Chi-Square value indicates whether there are cells in which transitions are occurring more often or less often than chance within the matrix. The next step to finding these interesting
transitions is to determine which individual cells in the matrix contributed to this Chi- Squared value. To do this, when running the Chi-Square calculation in SPSS, instruct SPSS to display the standard normal residuals for each cell. This number represents how much each cell (or transitional pair) has contributed to the overall Chi-Square statistic. A higher number indicates more contribution and conversely a lower number indicates less contribution. This means that individual cells with a high standard normal residual are representative of transitions which occur more frequently than one would expect, and large negative values indicate which transitions occur less
frequently than one would expect.
6.4.3 Criterion Number.
The standard normal residual is a number that represents how much the observed frequency of that transition differs from the expected frequency, under the null hypothesis that events within a transition are interdependent on one another. A
190
high standardised residual indicates that transitions from one event to another would have a high degree of interdependence. Due to the chronological nature of the transitional matrix, this means that when a cell in the matrix is indicated as being interdependent it is the second event (the one from the rows) which is viewed as being interdependent on the first.
Although a high standard normal residual value indicates an interdependent transition, and a low negative standard normal residual value indicates those transitions where one event might inhibit another, how is it known what counts as a high or low value? This is done by calculating a criterion value (Figure 6.2). The criterion value represents a value which the standard normal residual of a cell would have to be the same as or higher to be judged as making a strong contribution to the significance of the Chi Square.
Figure 6.2.
Criterion Value Equation.
= Criterion Number
The result of this equation was a criterion number of 1.03. This means that any pair of events which had a standard normal residual number higher than 1.03 was worthy of investigation.
6.4.5 Criterion Number: Problem of a Sparse Data Set.
The criterion number is dependent on the number of cells in the matrix, as this determines the value of the degrees of freedom. When the matrix is exceptionally large, as was the case of this one, there is a high degree of freedom value. When calculated from such a large value, the criterion number was quite low. This coupled
d f C h i S q n o.o f.c e l l
191
with the fact that the data was strongly sequenced meant that the more interesting transitions in the data set (based on unexpectedness) might be obscured. In order to be able to see the transitions that are interesting in terms of the story that they may be able to tell about a violent incident, it was felt the selection criteria needed to be more stringent than this criterion number. The number of cells in the matrix was reduced by creating a new code to replace any code that appeared 10 times or less. These
infrequent codes were then collapsed into one code ‘XX’. Any code pair that had one or both of this kind of code was removed from the transition matrix and re-entered as ‘XX’. This reduced the number of codes to 53. The meant that the matrix was reduced to one of 2,809 cells. The criterion number was recalculated as 1.86. As this analysis uses standardised normal residuals, zero codes in a cell are not an issue as the overall p-value, which would be affected, is not of interest.
To make sure the indicators of dependence were strong, transitions were only reported if they met two criteria: the first was that the standard normal residual had to be equal or above 2 as this was the criterion value rounded to a whole number. The second was that the observed values needed to be to the value of 5 or more. This method stopped the criterion value being misleading as in some cases the standard residual was high, but this was based only on one observed case. If the data set was larger, these high standard normal residual but low observed/expected pairs of codes might have shown more robust results, but to prevent the results from being
misleading they were not included in further discussion. 5 observed values were chosen as a criteria because it was large enough to remove isolated events which are likely to be random noise but small enough so that many transitions exist to be analysed, therefore allowing richer sequences of events to be discovered. An
192
testing. These assumptions recommend that for results to be valid, there needs to be expected values of 5 or more in 80% of the cells (Bryant & Satorra, 2012).