3.4 Data Transformation
3.4.1 From Interaction Records to Interaction Data
A number of different unit acts is put forth in the current literature on group communication (Poole et al. 2000: 144), but can be broadly mapped into two categories namely the type of unit act can be either a ‘natural’ or an ‘artificial’ unit. A ‘natural’ unit is a unit “…whose bounds are set in the phenomenon itself, such as the speakers turn, or a quarter’s performance in a firm, or a meeting.” (Poole et al., 2000: 144) and may range from the speaking turn (i.e. one unit starts when a participant starts speaking and ends when another participant starts speaking) to the thought-unit (i.e. the units are delineated by the meaning).
An ‘artificial’ unit is a unit set by the researcher and are useful when real time is the key metric of the analysis since it is easier to delineate. Examples of ‘artificial’ units range from a period of few seconds of discussion to as large as the whole theme of a discussion (Folger et al. 1984).
In the context of this research both types of units have been used; ‘natural’ units were used in the form of ‘thought units’ (Sillars, 1986: 6-7) that acted as the classificatory units for all coding schemes used except for one which used a 30- second artificial unit (i.e. the GWRCS as illustrated in Poole & Roth, 1989a).
While ‘natural’ units, such as the speaking turn, or ‘artificial’ units, such as 30- second time delineated units, are relatively easy to identify and distinguish, the same does not apply for the thought unit, since it requires for the researcher/’unitiser’ to
85
assign some sort of meaning to the text read (McGrath & Altermatt, 2003, p.532). Thus, and in order to ensure the reproducibility and accuracy (through consistency of coding classifications) of the results, it is important to have a certain degree of confidence that the unitizing method was applied in a reliably consistent manner (Folger et al. 1984). This is termed as unitizing reliability (Guetzkow, 1950) and is calculated by using Guetzkow’s U. Guetzkow’s U essentially calculates the degree of disagreement between two coders. It is calculated by the equation: U = (O1 – O2) /
(O1+O2) where U * 100 is the percent disagreement while O1 is coder one and O2 is
coder two. The percentage agreement can be easily calculated by calculating 1-U. Still, such a calculation offers only a metric for the percent agreement based on pure unit counts and not on a unit-by-unit basis which would be the optimal (Folger et al., 1984). For coding schemes that are meant to be used for sequential analysis a unit- by-unit measure of agreement is required (Franco & Rouwette, 2011; Folger et al., 1984). This can be achieved by applying Guetzkow’s U whereas the number of agreements or disagreements is counted on a unit-by-unit basis. This requires the methodological extension of how is agreement to be judged. Therefore Folger et al. (1984) introduce two different types of units the ‘actual’ and the ‘objective’ unit with the ‘objective’ unit being smaller than the actual. This distinction allows the comparison of the actual units against a benchmark (i.e. the objective unit). The coders then count the number of agreements29 on a unit-by-unit basis and Guetzkow’s U is calculated.
In this research the ‘objective’ unit has been defined as a number of words, that being two words. The rationale for this choice is that the objective unit had to bear at
29
86
least one word less than what any meaningful expression would bear. Moreover, any meaningful expression will require at least three words to be classified as a ‘thought unit’, for example one of the simplest meaningful utterances can be seen in the question “Are you OK?” (as well as in the corresponding answer “I am fine”), thus fulfilling the most basic requirement of a meaningful utterance of Subject-Verb- Object and its combinations (Verb-Subject-Object, Object-Verb-Subject, etc.) (Tomlin, 1986: 22).
The unitising instructions, indicating how the text was unitised can be seen in appendix 4.
Due to the vast volume of the data generated via micro-coding (i.e. coding each and every thought unit), a sampling technique was employed in order to test for the reliability of the unitising process (Poole et al., 2000: 165). Specifically, three excerpts of about 45 minutes each were randomly chosen and have received preliminary unitisation by the researcher (i.e. myself) generating about 500 units each. The raw un-unitised sample transcripts of 45 minutes were then given, along with the instructions, to a second coder to be unitised. The samples were drawn from the beginning, middle and end of the raw text corpus of all three cases (i.e. one from the beginning of case A, one from the middle of case C and one from the end of case B). This was done in order to ensure that any possible interaction distortions that could introduce bias in the unitisation process (e.g. rapid exchange of messages, high rate of interruptions, abnormally long single-person utterances etc.) would be taken into account and a more objective metric would be calculated. The percentages then were averaged to produce a single U metric.
87
The averaged Guetzkow’s U over 1500 units was 4%, which translates to an agreement of 96% (100%-4% = 96%); clearly an acceptable percentage to establish that the, transcribed to text, data have been unitised in a reliable manner (Folger et al. 1984, p.121).
No reliability metric was calculated for the 30-second units, since unitisation was a process of counting universally objective time units (i.e. seconds).
For Case A the unitised transcript yielded 2834 usable thought units and 483 30- second units over a net interaction period of 4 hours, 1 minute and 30 seconds.
The per stage thought units and the percentage they accounted for in terms of the net interaction time can be seen for case A in the following table (Table 3.3).
Table 3:3 Per Stage thought units for Case A
As such the table reads that stage 1 for Case A yielded 18 usable thought units accounting for 0.64% of the net interaction time (always excluding typing), stage 2 yielded, 1203 thought units accounting for 42.45% of the net interaction time and so on.
For Case B the unitised transcript yielded 1930 usable thought units and 279 30- second units over a net interaction period of 2 hours 19 minutes and 30 seconds.
The per stage thought units and the percentage they accounted for in terms of the net interaction time can be seen for case B in the following table (Table 3.4).
88 Table 3:4 Per Stage thought units for Case B
For Case C the unitised transcript yielded 5427 usable thought units and 751 30- second units over a net interaction period of 6 hours 15 minutes and 30 seconds.
The per stage thought units and the percentage they accounted for in terms of the net interaction time can be seen for case C in the following table (Table 3.5).
Table 3:5 Per Stage thought units for Case C