2 1.2 Author contribution
† Demographic Characteristics Compensation
4.11.3. Methodological issues Sampling
Time sampling has been used in the literature to assess nonverbal behaviours or linguistic aspects of text, such as number of utterances, lexical diversity and vocabulary use (e.g. Friedlander et al., 1972; Marvin et al., 1994). To our knowledge, this was the first study that used time sampling to qualitatively record the content of speech. The design and methods used were developed to address the aims of the particular study. It was therefore important to consider the methodological issues arising from this study. As seen in the introduction of this chapter, previous studies have used qualitative observational methodology to record narratives; however, time sampling methods have mainly been used in recording activity. The fact that the majority of many recent studies in rehabilitation have used the term “behavioural mapping” instead of time sampling may be
indicative of the use of this methodology. A possible explanation may be that time sampling is associated with the use of predefined coding schemes at the point of initial data collection in order to obtain strictly quantitative data. This notion has been challenged by authors like Croll (1986) and Bentzen (2009) who underlined the ability of time sampling methodology to combine several different techniques for recording such as predefined categories and narrative descriptions at the same time. Kerlinger & Lee (1999) recommended that researchers develop a coding scheme to fit the needs of the particular study. Consequently, the adequacy of a coding scheme can only be judged with reference to its purposes (Croll, 1986). For the purposes of this study a coding scheme was needed that would allow a detailed account of the group’s content to be obtained while requiring less interpretation from the observers. Despite being developed in order to decrease subjectivity, coding schemes with predetermined categories have been accused of being high inference observation systems (Croll, 1986). The reason is that observers have to make immediate decisions concerning the coding of behaviour (Croll, 1986). Furthermore, the recording schedule developed for this study needed to be quick, efficient and practical to use and ensure that recording would not interfere with observation. As the recording interval was short, the observer would not be able to look up a big amount of categories in the manual while observation was proceeding without undermining the accuracy of the coding. It was suggested that by recording observations qualitatively, a rich understanding of the group content was gained while at the same time the amount of inference and burden put on the observer were minimised. Collecting data at a detailed level allowed their more abstract categorisation at the stage of analysis. According to Bakeman &
Gottman (1997), if necessary, behaviour categories can always be lumped together during data analysis, but categories grouped together by the coding scheme cannot be split out later.
Minimising bias
By making a clear distinction between inferential and descriptive phases of analysis it was suggested that the danger of analytical bias might be reduced (George, 2009).
Combining quantitative and qualitative steps in content analysis is not uncommon in the literature. According to Bos & Tarnai (1999) the distinction is far from clear-cut as there is always a qualitative step at the beginning and end of every scientific procedure. At the beginning the researchers have to formulate the object of investigation, identify concepts and categories and establish the analytic tools. This stage can be followed by either a
qualitative or quantitative analysis (Bos & Tarnai, 1999). Following a literature review White & Marsh (2006) concluded that content analysis is a flexible method as many of the reviewed studies were not “purist” but used mixed methods approaches that combined elements of qualitative and quantitative analysis in order to answer the research questions.
It was also suggested that the use of time sampling methodology during data collection may have contributed in minimising some of the bias introduced by content analysis. The potential for selection bias is present in both qualitative and quantitative content analysis. Selectivity leads to consideration of some documents or parts of texts but not others (Waitzkin, 1990). Consequently, certain elements of observations may be emphasised rather than others that disconfirm the assumptions or expectations of the researchers. In this study, it is suggested that selection bias is reduced as a result of the time sampling strategy used at the point of data collection. This is because all the sampled observations were included in the analysis and each of the observations recorded during the pre-specified interval were assigned a code and categorised. The segmentation procedure, during which the units of analysis are identified, may also be a source of bias.
Variation in the length of the unit of analysis may result in overlapping units that are assigned different codes by independent coders. If these codes are treated as mutually exclusive, as in the case of quantitative content analysis, this may result in a serious methodological problem which is defined as “unit boundary overlap”(Strijbos et al., 2006).
This may have important implications for the reliability of the study as in content analysis unitising reliability (consistency in identifying the units to be categorised) is a precondition for interpretive reliability (consistency in assigning units to categories) (Waltz, 1991). In this study this bias was minimised as the boundaries of the text to be coded were specified before analysis as a result of the time sampling. Furthermore, due to the short duration of the recording interval the recorded text was a sentence or a compound sentence.
According to Strijbos et al. (2006) using a small unit such as a sentence may reduce the ambiguity of coding and consequently decrease the unit boundary overlap. This is because the sentences or parts of compound sentences are more likely to contain a single concept (Strijbos et al., 2006). This was also shown in the present study where it was found that the unit of analysis coincided with the unit of meaning.
Inter-coder agreement
The computation of inter-coder agreement gave acceptable results for all the content categories. It indicated that implementation of the coding process was not significantly different between the coders and, consequently, it could be argued that the coding scheme showed resistance to subjectivity and interpretative bias. It was acknowledged however, that both per cent agreement and Cohen’s kappa are susceptible to factors such as the number of observations and the number of categories (Rourke et al., 2000). As the number of categories decreases, the probability of per cent agreement by chance increases (Kolbe & Burnett, 1991). In contrast, Cohen’s kappa tends to be stricter in the case of fewer categories (Strijbos, 2006). This discrepancy was evident in the case of the
“cognitive skills” category which, being coded only once by both raters, appeared to have the highest per cent agreement and at the same time the lowest kappa coefficient. One of the strengths of the current study was that rather than providing the overall average reliability, reliability levels for each of the content categories were computed. It has been suggested that the overall reliability approach can yield misleading results (Kolbe &
Burnett, 1991; Lombard et al., 2002). As Kolbe & Burnett noted (1991) “While agreement may be high in the aggregate, low rating on individual variables may be hidden by polled results” (p. 249). Per cent agreement may also be inflated by adding very low frequency categories. The reason is that when reliabilities are calculated including these categories the agreements on these categories compensate for disagreements on other categories (Kolbe & Burnett, 1991). In terms of determining what constitutes an acceptable level of reliability, the present study adopted the recommendations of Landis & Koch (1977) which, however, are not universally accepted. The exact level of reliability that has to be achieved is not clearly established in the literature (Rourke et al, 2001). Neuendorf (2002), after reviewing different “rules of thumb”, suggested that coefficients of .80 or greater would be acceptable in most situations whereas coefficients of .90 or greater would be acceptable to all. In this study the vast majority of the categories exhibited coefficients higher than .90 which suggested substantial agreement irrespective of the criteria adopted. Because of the very good rates of agreement the few cases of disagreement were not further discussed between the coders.
Validity
In this study categories were exhaustive and exclusive, meaning that all relevant concepts were represented in the coding scheme, which may provide an indication of good
content validity (Neuendorf, 2002). Incorporating a qualitative study where categories were developed inductively from the manifest content of the text of content analysis may have also contributed to achieving content validity (Rourke & Anderson, 2004). An attempt was also made to provide thorough information about inter-rater agreement, training procedures for coders and examples of the coding scheme. As Rourke & Anderson (2004) suggested, this is another important step to be taken towards establishing validity.
Empirical evidence for validity can also be gathered mainly through examination of group differences and through the use of alternative methods of data collection to corroborate the results of content analysis (Rourke & Anderson, 2004). This study showed that the developed coding scheme was sensitive to the differences between the different rehabilitation programmes, providing further evidence for its validity.
4.11.4. Limitations and future directions