Data collection and Analysis - | Research Design, Methodology and Ethics

Chapter 3 | Research Design, Methodology and Ethics

3.3. Data collection and Analysis

Once uploaded into NVivo the text-based data sources – acquis and interview transcripts – were ready for the main empirical analysis. This raised a methodological problem. Text- based sources and interviews tend to be treated as different types of sources in methodological discussions (Bryman, 2008; Berg, 2004; Hycner, 1985). Because interviews involve direct contact with subjects, data extraction involves the identification of more sociological, linguistic units of meaning from “between the lines” of the participants’ statements (Hycner, 1985, pp. 280–294). By contrast, purely text-based sources can be subjected to a content analysis where the occurrence of words and phrases are counted and inferences drawn from the quantitative data.

This raised a methodological challenge. The thesis sought to engage in a fact-finding exercise prior to the important stage of explaining (King et al., 1994, p. 5) policy development processes. On the one hand, therefore, the aim was not to carry out a purely linguistic or sociological analysis. On the other hand, a standard, literature-based content analysis would not have yielded the data necessary to analyse policy development.

In order to analyse both source types consistently, a conceptual content analysis was developed for this thesis. This was achieved by combining methodological techniques developed by Hycner and Berg. Similar ideas, rather than individual words, were identified and counted. Themes, concepts and units of meaning of relevance to the research question were identified and coded (Hycner, 1985, p. 284). This was a similar exercise to a content analysis, but focussed on more abstract concepts rather than individual words. The “counts” of textual elements which characterise content analysis provided a tool for identifying specific units of meaning from which this researcher could learn about participants’ views of social phenomena (Berg, 2004, pp. 241–242). The fact that this “Hycner-Berg” technique could be effectively applied to both interviews and textual documents made this tool invaluable to this research. It could be applied effectively to both textual primary literature sources such as the acquis communautaire and the transcripts of elite interviews.

An additional benefit of employing a conceptual rather than standard, metric content analysis became apparent because different words and phrases were used to describe the same elements and concepts over the course of the 28 years between 1985 and 2013. An example of this is the term “cyber security”. This term was used in Union acquis from 2002 onwards. In the preceding years terms such as “online security”, “network and information security” and “online” or “internet safety” were used interchangeably to refer to the risks, threats, concerns and issues which comprise cyber security in its most recent iteration. Conducting a standard content analysis, where the occurrences of specific words are counted, would therefore miss out occurrences of the same concept, where different words were used in their description.

3.3.1. Generating Data: Coding the acquis and interview

transcripts

The first step in the data analysis process was to generate a coding schedule to be applied to all text-based sources – the primary literature and interview transcripts. This coding schedule would comprise key concepts to be sought in all text sources and reflect the aims of the research. Institutional drivers, actor participation and non-institutional elements would be identified and coded.

Because the research sought to identify the institutional drivers behind a specific policy sector – cyber security – a control schedule was generated by conducting a content analysis of the EU’s Cyber Security Strategy (EUCSS). In order to understand the development of the EU’s policy choices, it was first necessary to identify which processes and concepts were most relevant to that policy. Because the EUCSS represented the sum total of the EU’s policy choices and the end-point of its development process, it contained the elements which could be sought in preceding policy documents that would explain the development process. This was achieved by conducting an open coding of the EUCSS.

This action yielded a schedule of 43 discrete codes, labelled “nodes” in NVivo software. Some of these nodes referred to similar concepts, but involved separate entities. For example, co-operation between EU Member States or co-operation between EU agencies were similar but coded separately. These discrete nodes were collated into what Hycner (1985, p. 287) labelled “clusters”. From 43 separate nodes, 16 clusters were distilled. Some clusters contained only single nodes. Others such as “facilitation” contained as many as seven. The purpose was to derive, as closely as possible, collective units of

meaning referring to what Berg (2004, p. 239) described as the unit’s essence or telos23. They facilitated the identification of latent content. This is data inferred from the words used. It contrasts with manifest content, where information is specifically expressed (Berg, 2004, p. 242). These thematic clusters would be sought in the complete library of text sources. The complete NVivo node list – i.e. the digital control coding schedule – is available at Appendix 11.

Coding the data sources according to the control schedule involved reading acquis and transcripts to identify units of meaning in the conceptual content analysis. This reading led to the identification of a number of further ideational and institutional elements found in the acquis and interviews, but which were not set out in the EUCSS. Due to the prevalence and recurrence of these elements, two further supplementary coding schedules were initiated: one for acquis, the other for interviews. All acquis and interview transcripts were thereafter coded three times, first with the EUCSS control and then with the two non- EUCSS coding schedules. This exercise ensured the capture of as much relevant data as possible relating to the research question. The non-EUCSS schedules are provided at Appendix 12.

As with the control schedule derived from the EUCSS, the units of meaning derived from the acquis and the interviews were arranged in thematic clusters. It should be acknowledged at this point that a degree of the researcher’s own judgement was employed to determine whether or not two or more units of meaning were synonymous due to the inconsistent use of terminology. This is a potential limitation in the data collection process. According to Hycner (1985, p. 288), the researchers’ own suppositions may generate a bias in the resulting data. In the case of this thesis, this potential bias from the researcher’s presuppositions could be minimised. This was achieved by developing clusters and synonyms derived from specific sources in the texts and verbatim statements in the interviews, rather than the researcher projecting an inference or interpretation of what was meant in, for example, an interview.

Two additional activities were employed to enhance the reliability and validity of data. The first was further eliminating redundancies once the coding was completed (Hycner, 1985, p. 286). While care was taken throughout the data collection process to avoid duplication of nodes or synonymous concepts, similar units of meaning were inadvertently identified and coded separately. To ensure the transparency, reliability and validity of the

data collected, a “clean-up” of the NVivo nodes (the CAQDAS codes and units of meaning) was undertaken prior to examination and analysis of the results. This clean-up clustered together or corrected synonymous concepts to ensure as little aberrant duplication as possible. The final data set was as free from duplication redundancies as was possible to achieve.

To further enhance data reliability and validity, triangulation was also employed. Triangulation is a corroborative technique which involves the use of several methods or sources at once “so that the biases of any one method might be cancelled out by those of the others” (Seale, 1999, pp. 472–473). Tarrow (2010, p. 108) argues that triangulation is a useful tool for the corroboration of findings derived from both qualitative and quantitative data collection techniques. Findings from the primary literature – such as the preference for the EU towards facilitating co-operation – were also identified independently in certain of the elite interviews. Such triangulation exercises increased the reliability of the findings by reducing the reliance on one particular type of data source.

Diagram 3-3: Data generation process (coding)

Following the conceptual content analysis exercise undertaken to extract data from the

acquis and interview transcripts, those data were tabulated according to the most

frequently occurring elements. What was being sought were the actors most frequently involved in the policy-making process, the institution of greatest influence in this sector

Control paper processed in N- Vivo (EUCSS, 2013) 5 codes identified from interviews 32 codes identified from acquis 43 control codes identified Documents manually processed to identify extra

codes (concepts)

Number of occurrences of each code identified

Removal of redundancies

Three key variables identified from number of occurrences:

1. competences 2. Actors

and the most frequently occurring non-institutional elements. To effect an HI analysis these three details were required to be identified over the entire course of the timescape. Specifying actors, institutions and elements in this manner would concentrate the analysis on the interaction of those three aspects over time, a key component of an HI analysis. These data tables are presented in the empirical chapters of this thesis.

The quantitative exercise outlined above was vital in preparing data for a more qualitative analysis. As Goldstone (1991, pp. 50–62) states:

To identify the process, one must perform the difficult cognitive feat of figuring out

which aspects of the initial conditions observed, in conjunction with which simple principles of the many that may be at work, would have combined to generate the

observed sequence of events. (emphasis in original)

Any institutional analysis of policy development must first identify which institutions and actors are relevant in the sector under examination. This identifies who and what are involved and of importance to the policy development process. Once this has been achieved, how and why they are involved can be investigated. To achieve this, a qualitative narrative inquiry approach was employed.

In document Cyber security in the European Union: an historical institutionalist analysis of a 21st century security concern (Page 76-81)