Segmentation and Coding of Interview Data

5. Semi-Structured Interviews with Subject Matter Experts

5.2 Segmentation and Coding of Interview Data

After the interviews were transcribed, they were analyzed for conceptual content, organization of concepts, and to answer the research questions of the dissertation. The researcher read through the printed transcriptions several times and took notes to record how each of the SMEs responded to the interview questions, to notice patterns and themes, and to relate themes across the different interviews and questions. The themes from the post-interview notes were compared with the notes taken

during the interviews to ensure no themes or important concepts were missed from impressions captured during the interview.

The text documents containing the interview transcripts were segmented and coded in two separate ways for analysis. First, the text was segmented by sentences and analyzed to elicit concepts from the interviews. Second, a copy of the interview data were segmented by ideas, which spanned from a portion of one sentence to several sentences in length, and coded according to which question group they addressed.

5.2.1 Sentence-Level Concept Analysis. To analyze the document at the sentence level, the transcriptions were divided up into individual sentence-sized segments. The rule that guided segmentation was: “segments should be sentences, and are described by punctuation in the text file (like periods or question marks) that normally indicate the end of a sentence.” To segment discourse that was difficult to segment using that rule, the rule “a segment should represent a single idea” was used as a secondary way to demarcate segments from the text. Afterward, all of the segments were reviewed by the researcher to ensure they represented legitimate sentences and phrases.

Once the transcriptions were segmented, each of the segments was coded by the researcher to annotate conceptual content. Each segment within each of the documents was annotated with one or more tags to represent the concepts discussed within that segment. For instance, a single line of text would read something like: “PROTECTIONS, BREAK, APPROACH, INTUITION, (the sentence text)” if the concepts “protection,” “break,” “approach,” and “intuition” were talked about in the sentence. During coding, the segments were kept in their original ordering to ensure referring expressions within the segments maintained their original contexts in the SME’s responses to interview questions.

Once all of the sentences were coded, the researcher wrote an automated script to extract the concepts from the text documents, to count their frequencies of occurrence, and to determine the co-occurrences in which one concept appeared with another

concept throughout all of the text documents. The script is included as Attachment F.

5.2.2 Idea-Level Concept Analysis. After the sentence-based segmentation, copies of the original, un-coded documents were reviewed again and the text was segmented based on the following rule alone: “segments should represent a single idea.” This rule divided the segments much differently, with smaller ideas taking only a part of a sentence and larger or more complex ideas taking sometimes several sentences to express. In cases where the SME participant used storytelling to elaborate on an idea, the segments relating to the annotation of the idea were longer.

Once the text was broken up into idea-sized segments, the segments were coded based on their relation to one of the question groups from the questionnaire. The following codes represented the different groups of questions in the interview questionnaire:

• APPROACH - Statements related to the approach taken to solve a reverse engineering task.

• CUES - Statements related to using information cues in the course of a task. • DECISIONS - Statements related to decisions in a task and how they are made. • DOMAIN - Statements referring to the organization of the reverse engineering

problem domain.

• GOALS - Statements relating to the underlying goals used in performing a task. • KNOWLEDGE - Statements related to concepts a reverse engineer needs to

know.

• SKILLS - Statements related to abilities or procedures a reverse engineer needs to have.

• TACIT - Statements referring to knowledge that has become proceduralized or automatized with experience.

• TOOLS - Statements relating to reverse engineering tools.

During coding, the answers were abstracted by reviewing each of the original documents several times and annotating a more abstract and concise description of the SME’s response. Each annotation had a marking to identify the SME and a code which described the category to which the response applied. For instance, a long sentence describing the goal of knowing the purpose for performing reverse engineering would be coded: “SME1, GOAL, Find the purpose.” Another sentence describing a necessary piece of knowledge for a reverse engineer might be coded: “SME4, KNOWL- EDGE, Manual function name resolution.” The annotations were to be self-contained so that they would not rely on a reading of the text or referring expressions within the text, but instead represent abstracted and captured answers to the research questions. Once the annotations were created and coded, they were combined into a single file for analysis. This file provides a concise list of the interview questions and the SMEs’ answers to the questions. Each of the SMEs were later asked to verify the organization of the goals, knowledge requirements, and tool needs and to provide com- ments on the structure of the overall responses. The abstracted interview responses are found in Appendix G. The representation of a goal-directed task analysis [68] of reverse engineering as constructed from the interviews is found in Appendix H.

In document Understanding How Reverse Engineers Make Sense of Programs from Assembly Language Representations (Page 127-130)