Chapter 3 Methodology
3.4 Data analysis strategy and procedures
3.4.3 Data analysis procedures
Having clarified some concepts relevant to thematic analysis, I followed Braun and Clarke’s (2006) six thematic analysis phases to analyse my data (see Table 3.1).
Table 3.1: Phases of Thematic Analysis (Braun & Clarke, 2006, p. 87)
Phase Description of the process
1.Familiarizing yourself with your data
Transcribing data (if necessary), reading and re-reading the data, noting down initial ideas.
2.Generating initial codes
Coding interesting features of the data in a systematic fashion across the entire data set, collating data relevant to each code.
3.Searching for themes Collating codes into potential themes, gathering all data relevant to each potential theme.
4. Reviewing themes
Checking if the themes work in relation to the coded extracts (level 1) and the entire data set (Level 2); generating a thematic ‘map’ of the analysis.
5.Defining and naming themes
Ongoing analysis to refine the specifics of each theme, and the overall story the analysis tells, generating clear definitions and names for each theme.
6. Producing the report
The final opportunity for analysis. Selection of vivid, compelling extract examples, final analysis of selected extracts, relating back of the analysis to the research question and literature, producing a scholarly report of the analysis.
The key point in phase 1 of data analysis is to “immerse yourself in the data to the extent that you are familiar with the depth and breadth of the content” (Braun & Clarke, 2006, p. 87) by transcribing data, reading and re-reading the data, noting down initial ideas and so forth. After the data were transcribed and checked, I printed out each transcription with a cover sheet that gave an overview of each interview. The cover sheet consisted of three parts. Part one was the interviewee profile including an identity code assigned by myself to protect confidentiality and the bio-data. Part two was the interviewee’s answers to every question in the interview. Part three was my interview notes taken by myself in Chinese after interviews if I felt necessary, recording my general impression of the interview and my own reflections. My purpose in doing this was partly to facilitate a quick matching of each transcription with the interviewee, and partly to familiarize myself with the data.
Another way to immerse myself in the data was to read and re-read the conversations one by one on paper. Along with the reading, I started to underline important words and phrases in pencil, and summarise these extracts in the margin in either Chinese or English. Meanwhile, I still listened to the audio data in the evening when I was too tired to read and in the morning when I had just woken up. In this way, I felt that I had become very familiar with these data. Although this process was very time-consuming and at times challenging, it enabled me to match in my mind the content with the interviewees’ personal experiences and their context. In the later stages of data analysis it would be important for me to understand how the interviewee’s reality was socially constructed.
Having familiarized myself with the whole data set, I started to code my data by writing notes in the margins. At the beginning of stage 2, I attempted to treat each data item equally and code as many potential themes as possible. I was trying to treat the data as a whole unit and was looking for themes, but this turned out to be inappropriate. In the course of initially coding the data, I found that it was very rich and detailed. Such a wide range of emerging ideas and categories seemed too much for one project. In addition, because of the range and depth of the data, I occasionally got lost in it.
research questions to see if the data collected were rich enough to answer those questions, and then tried to seek answers corresponding to each research question among the data items. This second “top down” (theory-driven) method enabled me to focus on one question or theme at a time, and hence enabled me to relate the coding process to the research aims and questions. For example, in Chapter 2, I identified the first research question I needed to explore as concerning the participants’ perceptions of cultural differences, according to Osland and Bird’s (2005-2006) model. Therefore, I began by putting all the data extracts about the participants’ understanding of and comments on cultural differences in a Word file. This mixed coding approach helped keep the research open to new directions and interpretations, while at the same time keeping the research aims in mind. As a result, the initial codes derived from both bottom-up and top-down coding were divided into two parts. Those relevant to the research questions were grouped together and the others were temporarily put in an “others” group for further analysis later.
After the initial codes had been constructed, I started to group these data extracts and codes into an Excel file, but soon found that it was inappropriate. The process of coding was iterative and full of un-coding, re-coding and un-grouping and regrouping, which was very inconvenient to do in an Excel file. Thus, I was advised to learn NVivo software and use it to assist my data analysis. Spending time on learning this software was worthwhile since it shortened the process of my data analysis. From then on, the process of data analysis was carried out using NVivo 9. First, I copied all the transcriptions stored in Word files into my new NVivo project as internal documents. Second, each initial code became a free “node” in NVivo. In other words, each free node had one code as a heading matching corresponding extracts from the text. For example, all extracts about the participants’ perceptions of cultural differences were stored in the node “perceptions of cultural differences”. In this stage, these free nodes were unorganised nodes and only captured general themes.
In stage 3, I started to establish a node structure based on the research questions. For example, all the codes relevant to the first question (HCS’s perceptions of cultural differences) were grouped and revised again and again. Meanwhile, I
went back to the literature and attempted to seek differences and similarities. For example, in searching for themes relevant to HCS’s perceptions of cultural differences, I found that the participants constructed cultural differences of three types: personality, communication styles, and cultural values. Research on personality is in the field of psychology and so I had to read literature relevant to personality. Once the subthemes had been established, I created three nodes under the node “perceptions of cultural differences”: differences in personality, differences in communication styles, and differences in cultural values. In the same way, I created two nodes under the node “differences in personality”, two nodes under the node "differences in communication styles", and four nodes under the node “differences in cultural values”. As I did this, the hierarchical structures of the nodes were gradually created and their relationships became apparent. The names of the nodes were gradually developed into the themes and subthemes of this study (see Appendix 10 for a sample of node structure about the first question).
Stages 4, 5 and 6 of the data analysis were intertwined with each other. In effect, after collating all the codes into potential themes, I started to write up the data findings chapter by chapter, discussing the preliminary findings with my supervisor and then refining again and again. Meanwhile, I presented them at two international conferences in November 2012 and April 2013. From these presentations I received valuable suggestions about the theoretical framework, the ways I analysed the data and the themes I had categorized. For example, in the second conference, some experts questioned the framework I had adopted and suggested that I should make the reasons why I adopted it clear.
The whole process of data analysis turned out to be complex. Because of my scientific academic background (my first degree was applied mathematics and my master’s degree was management science and engineering), I ignored the diversity and complexity of the data, and tried to fit my data into existing categories from the literature in the first stages of data analysis. Fortunately, the problem was pointed out by my supervisor who pulled me back from a quantitative to a qualitative approach to analysing my data. The process of data analysis was a process of knowledge and research skill building. It happened gradually, but
eventually the phenomena and the issues in question became clear through the processes described above, which led to the development of the next two chapters (findings and discussion) of this thesis.