3.2 Interaction with an Ideal Speech Display
3.2.2 Execution
Participants
I recruited twenty people familiar with sending and receiving email, browsing the Web, and managing files to participate in our study. Seven participants (four male, three female) had a visual impairment while thirteen participants (ten male, three female) did not. Four participants with visual impairments (one male, three female) had been blind since childhood while the remaining three developed the impairment later in life. All but one of the participants with visual impairments relied on a screen reader in their daily computer use. The exception was a user who had enough residual vision to interact with the computer by sitting very close to the display or with the help of a magnifying lens. Sessions with the first three participants (two males with vision, one female without) were deemed pilots. These sessions were used to debug the test procedure and give me practice in the role of the auditory display. The results from the pilot sessions were not considered in the analysis below.
Setting
All twenty, one-hour study sessions were conducted over a two month period in the spring of 2005. Sixteen of the twenty sessions were performed in a dedicated lab space at UNC where auditory and visual distractions were nearly non-existent. The remaining four sessions were conducted at the Library for the Blind in Raleigh, NC to accommodate
participants who could not travel to UNC. A room at the library was dedicated to our use and helped keep distractions to an acceptable minimum.
In both environments, a computer was on hand to assist me in the role of the auditory display. Applications representative of those needed to complete the required tasks, copies of the emails assigning tasks, and mock-ups of the necessary XYZ company Web pages were pre-installed on the computer. The participant and principle investigator were seated such that participant could not read the computer screen. Participants were allowed to see me to reinforce the idea that they were interacting with an unconstrained auditory display. To avoid communication with the participant via backchannels, I consciously avoided looking at the participant, making gestures, or changing my facial expression.
Analysis
After running all of the study sessions, I transcribed each of the audio recordings into plain text files. Each transcription included the communication between participant and me for the thirty minute scenario and follow-up discussion. In addition, the tran- scriptions indicated any pauses longer than three seconds during the scenario and any laughter or other non-speech vocalizations. Once complete, I reviewed the transcriptions for accuracy while listening to the original audio recordings.
I used Weft QDA (http://www.pressure.to/qda/) to code segments of each tran- scription according to patterns of behavior seen across participants. Twenty-one codes resulted from this first pass. I next analyzed the initial code batch for the purpose of grouping codes into more general categories and eliminating codes not related to my current research. Seven themes emerged from this analysis. Finally, I coded each tran- scription again with labels representing these seven categories. Table 2 gives the initial and final codes.
Initial code Final code
Memory aid Memory
Memory problem Reference problem
Autohelp Prompting
Perfect understanding
Visual thinking Visual thinking
Summary Summaries
Filtering
Speed Searching
Ordering
Alert Alerts
Batch operation Batch operations
Mobility Uncoded Pause Like Dislike Confusion Chanel saturation Novice experience Voice input Task switch
Table 3.3: Initial and final codes for the ideal display study.
Validation
To validate my codings, I used a triangulation technique. I first recruited one former and one current computer science graduate student to aid me. Each person was familiar with my research but not knowledgeable about the details of this study. I gave each validator a unique simple random sample of fifty segments drawn out of the total population of coded segments (520) and a random sample of uncoded segments. I also provided the validators with a list of the seven final codes, their definitions, and a few fabricated conversation segments exemplifying each. I then asked the validators to label their samples with zero, one, or more of the codes per segment.
I next compared the validator codings with my original codings. For each segment in a sample, I counted the number matches between the validator codes and my codes.
I summed the number of matches in a sample and divided by the number of codes I assigned to the segment to produce a scoring metric:
score= (overlap)/(mine)
Using this metric, I found that 63% of the codes from the first validator and 80% of the codes for the second validator matched my eight-category coding. These scores give me some confidence that the themes I identified do indeed exist in the transcripts.
It is important to note two shortcomings of this metric. First, it provides no infor- mation about cases where the validators assigned more codes than I did. I purposely chose to exclude this information from the metric so that it would only validate my codings, not suggest additional codings per segment. Second, the metric is negatively affected by differences in interpretations of each theme. For instance, I used the memory code to label segments in which participants used the auditory display as a memory aid. On the other hand, the first validator consistently used the prompt code in these cases resulting in a 6% reduction of the first score. This reduction, and possibly others, is artificial with respect to indicating theme discordance. In this case, both the validator and I noted a theme present across a number of segments, but labeled it differently.