Data Analysis and Coding - Theoretical Basis

Chapter 3: Theoretical Basis

4.5 Data Analysis and Coding

Having identified that data would be gathered by interview, a method which generates a substantial amount of transcribed text, it was necessary to select a method of analysis, which almost inevitably for qualitative work involves coding the data.

4.5.1 Coding Methods

The analysis of data by assignment of one or more codes is an extremely popular choice for qualitative analysis (Bernard 2000, p.443) and can be used either to merely organise and reduce data or to question it, revealing new concepts (Schreier 2012, p.38). The codes applied can be determined in advance from other sources (possibly working hypotheses drawn from other research) and used deductively for confirmation (Bernard 2000, p.444) or as far as possible generated objectively from the data such as with Grounded Theory (Glaser and Strauss, 1967).

In reality it is accepted as impossible to exclude completely all pre-existing attitudes and theories (Gibbs 2007, pp.42–46), however the lack of a formal requirement for pre-existing codes can be useful where there is no accepted applicable theory (Orlikowski, 1993). Since there had been little work published at the point both of design and analysis (Burley et al.’s work for example not being published until (2014) and in any case taking a very different theoretical approach), an exploratory approach was appropriate. Whilst there is a large range of possible coding techniques (see Saldaña (2009) for a concise discussion of the field), perhaps the best-known and most widely used are Grounded Theory and the various forms of Content Analysis.

Grounded Theory discovery is a reaction to what its developers saw as the contemporary prevailing thesis of sociological method. Rather than rigorously verifying hypotheses theorised

elsewhere, the analysis of data can itself generate theory grounded in that data (Glaser and Strauss 1967, p.1). The induction versus deduction directionality (from the data comes theory) is held by this school to be paramount, to avoid exampling, or the confirmation of a theory which was actually previously held with a conveniently selected empirical datum (Glaser and Strauss 1967, pp.5–6). The authors therefore argue that it is not simply a method for organising data (pp.132–133) or justifying hypotheses which originated elsewhere. The attempts to verify such hypotheses pollute the process of grounding theory in the data and then constantly comparing new data to that theory.

Qualitative Content Analysis (QCA) is a systematic but flexible approach for exposing meaning from within text (Schreier 2012, pp.1–19). There is in some cases a strong emphasis on validity, ideally through double coding (coding being performed by more than one researcher) or re-coding after an interval. It differs from standard re-coding methods as its intention is to reduce data volumes by examining it from one distinct angle after the creation of a coding frame and summarising the data through the coding. To achieve this, codes are mutually exclusive, in that the target text is split into sections and each section assigned one code only, differing from the multiple codes and highly reflexive coding processes used in other techniques such as Grounded Theory coding. (Schreier 2012, pp.37–57).

Content Analysis can be used in a positivist epistemology to verify a hypothesis but this is more usually associated with the quantitative form (hence this is not considered in detail here); QCA is also compatible with an interpretive ontology and an anti-positivist epistemology, requiring the reader to interpret the text to understand the viewpoint of the source (Graneheim and Lundman, 2004), although it is also deployed alongside quantisation and subsequent statistical analysis in mixed-methods studies (Sandelowski et al., 2009) and can also be used by those taking a realist ontology (Schreier 2012, p.47). As discussed above, this study takes a more interpretative approach which does not well support positivist or realist concepts, thus the flavour of QCA considered here would be its more usual role as interpreting, summarising and describing data. In this way the movements of actors over a long period should be visible.

4.5.2 Selection

Both forms of textual coding described above would be valid choices and are well-accepted for analysing fieldwork. Grounded Theory emphasises systemic abstraction and generation of theory from the text; QCA reduces and describes the text but shows less emphasis on the systematisation of theory generation. Although Grounded Theory is associated with a well-respected method of coding (and that is not challenged here), the suggestion that by rigorous and systemic means the subjectivity of human analysis can be effectively mitigated is not

compatible with the interpretative position of this study. Similarly with a single researcher study it would be difficult to achieve such reproducibility and validity even if it were wished due to the necessarily consistent bias of the researcher. Moreover, the preparation of an ANT account is inescapably subjective, given that it is the re-telling by the researcher what they consider to be the pivotal events from stories heard from the subjects. No undue emphasis is therefore placed here on achieving objectivity and reliability through coding processes. Of greater importance was the construction of a frame to bring order and coherence to as many nuanced codes as could be practically supported without losing the ability to identify some common concepts or frequent assertions in the text.

Those techniques which are interested principally in applying codes in an entirely reproducible way – if necessary at the cost of capturing nuance – such that the truth therein can be captured free of interference from subjectivity, are not useful for this purpose. ANT itself is a sensitising guide for the researcher and has been suggested as a way to generate theory well-grounded in the text (Whittle and Spicer, 2008). It is perhaps possible to become too fixated on technique at the expense of achieving an interesting and broad account for debate. In constructing a coding approach, the fundamentals of Schreier’s approach to coding and analysis were seen as marginally more compatible with the study’s ontology due to the higher flexibility and lower emphasis on systematisation (although it too emphasises validity more than is considered useful here). The necessity to code the entire text was seen to be beneficial in ensuring that the researcher was forced to consider the full data set, rather than those parts which appeared significant during the analysis, in case something not previously identified as salient is accidentally missed. As Saldaña (2009, p.15) warns, this is a particular risk for less experienced researchers.

In this approach there is no theoretical limit on the number of levels or sub-categories allowed, however it is suggested that human coders are not practically able to cope with more than around forty such units in total. This limit may of course reflect an emphasis on validity and thus achieving high inter-coding Kappa values rather than achieving the widest possible range of captured concepts. Given that this study is interested in multiple related but not completely atomic concepts, to attempt to fit all data into such a narrow frame for exploratory work was considered impractical. In Schreier’s (2012) text it is proposed that categories might represent dimensions, where the data is examined in terms of each category. Sub-categories must be mutually exclusive (data matches only one category) and exhaustive (data can be accurately coded by a category). That is not seen as useful here (since data answering a discrete professionalisation section is unlikely to be usefully also coded into a history of security section), therefore a more straightforward hierarchical but single dimension frame is preferred.

With respect to frame construction, Schreier’s “hybrid” model was preferred, where an existing understanding of the subject area is used to create a conceptual frame (to seed the analysis and to show whether any of the underlying assumptions are not found in the data) but to extend this where the conceptual model is found wanting. Since the research questions identified discrete themes these were felt to be useful in allowing an entry-point for coding.

Some departure from Schreier’s model was desired however, since in her (2012) model a coding frame is first established from around one tenth of the data, finalised and then applied to the rest of the material. That is explicitly rejected here, since this seems to imply that the coding is so general or the data so homogenous that a sample of the data contains everything which can be usefully learned concerning the frame. In this study it was determined that the frame should be developed as each additional text was added and analysed, with codes condensed either due to near-duplication or undesirable proliferation, thus allowing for being “surprised” by the data.

This brings the method closer to the General Inductive Approach (Thomas, 2006).

Given the volume of data expected to be analysed and coded, as is now common it was considered most efficient to make use of computer aids for coding and analysis, commonly termed Computer Aided Qualitative Data Analysis Software, or CAQDAS.

4.5.3 Units of Analysis

Coding is performed on three types of segment: units of analysis, coding and context. Choosing a unit of analysis in this case is straightforward, since as an interview represents an easily-identified single, internally related text. Units of coding (or “units of meaning”) represent the block of text which is atomic (in other words which is assigned to a particular subcategory without further dissection), which varies between dimensions according to the information required. It comprises those sections of text which are “related to each other through their content and context” (Graneheim and Lundman, 2003). Whilst considerable debate can be had on the topic, this study is aligned with the assertion that “Social interaction does not occur in neat, isolated units” (Glesne, 2006 cited in Saldaña 2009, p.16), and suggests that to analyse in more regulated units is to deny the possibility of multiply-nuanced short passages of text in favour of achieving higher rates of reproducible but narrow coding. By choosing a variable length unit of coding, clearly the unit of context (that part of the text needed to understand the unit of coding) becomes similarly variable and this is discussed in the report of the work as performed.

In document To What Extent Has Information Security Professionalism Achieved Recognition? (Page 104-108)