Execution - Interaction with an Ideal Speech Display

3.2 Interaction with an Ideal Speech Display

3.2.2 Execution

Participants

I recruited twenty people familiar with sending and receiving email, browsing the Web, and managing files to participate in our study. Seven participants (four male, three female) had a visual impairment while thirteen participants (ten male, three female) did not. Four participants with visual impairments (one male, three female) had been blind since childhood while the remaining three developed the impairment later in life. All but one of the participants with visual impairments relied on a screen reader in their daily computer use. The exception was a user who had enough residual vision to interact with the computer by sitting very close to the display or with the help of a magnifying lens. Sessions with the first three participants (two males with vision, one female without) were deemed pilots. These sessions were used to debug the test procedure and give me practice in the role of the auditory display. The results from the pilot sessions were not considered in the analysis below.

Setting

All twenty, one-hour study sessions were conducted over a two month period in the spring of 2005. Sixteen of the twenty sessions were performed in a dedicated lab space at UNC where auditory and visual distractions were nearly non-existent. The remaining four sessions were conducted at the Library for the Blind in Raleigh, NC to accommodate

participants who could not travel to UNC. A room at the library was dedicated to our use and helped keep distractions to an acceptable minimum.

In both environments, a computer was on hand to assist me in the role of the auditory display. Applications representative of those needed to complete the required tasks, copies of the emails assigning tasks, and mock-ups of the necessary XYZ company Web pages were pre-installed on the computer. The participant and principle investigator were seated such that participant could not read the computer screen. Participants were allowed to see me to reinforce the idea that they were interacting with an unconstrained auditory display. To avoid communication with the participant via backchannels, I consciously avoided looking at the participant, making gestures, or changing my facial expression.

Analysis

After running all of the study sessions, I transcribed each of the audio recordings into plain text files. Each transcription included the communication between participant and me for the thirty minute scenario and follow-up discussion. In addition, the transcriptions indicated any pauses longer than three seconds during the scenario and any laughter or other non-speech vocalizations. Once complete, I reviewed the transcriptions for accuracy while listening to the original audio recordings.

I used Weft QDA (http://www.pressure.to/qda/) to code segments of each transcription according to patterns of behavior seen across participants. Twenty-one codes resulted from this first pass. I next analyzed the initial code batch for the purpose of grouping codes into more general categories and eliminating codes not related to my current research. Seven themes emerged from this analysis. Finally, I coded each transcription again with labels representing these seven categories. Table 2 gives the initial and final codes.

Initial code Final code

Memory aid Memory

Memory problem Reference problem

Autohelp Prompting

Perfect understanding

Visual thinking Visual thinking

Summary Summaries

Filtering

Speed Searching

Ordering

Alert Alerts

Batch operation Batch operations

Mobility Uncoded Pause Like Dislike Confusion Chanel saturation Novice experience Voice input Task switch

Table 3.3: Initial and final codes for the ideal display study.

Validation

To validate my codings, I used a triangulation technique. I first recruited one former and one current computer science graduate student to aid me. Each person was familiar with my research but not knowledgeable about the details of this study. I gave each validator a unique simple random sample of fifty segments drawn out of the total population of coded segments (520) and a random sample of uncoded segments. I also provided the validators with a list of the seven final codes, their definitions, and a few fabricated conversation segments exemplifying each. I then asked the validators to label their samples with zero, one, or more of the codes per segment.

I next compared the validator codings with my original codings. For each segment in a sample, I counted the number matches between the validator codes and my codes.

I summed the number of matches in a sample and divided by the number of codes I assigned to the segment to produce a scoring metric:

score= (overlap)/(mine)

Using this metric, I found that 63% of the codes from the first validator and 80% of the codes for the second validator matched my eight-category coding. These scores give me some confidence that the themes I identified do indeed exist in the transcripts.

It is important to note two shortcomings of this metric. First, it provides no information about cases where the validators assigned more codes than I did. I purposely chose to exclude this information from the metric so that it would only validate my codings, not suggest additional codings per segment. Second, the metric is negatively affected by differences in interpretations of each theme. For instance, I used the memory code to label segments in which participants used the auditory display as a memory aid. On the other hand, the first validator consistently used the prompt code in these cases resulting in a 6% reduction of the first score. This reduction, and possibly others, is artificial with respect to indicating theme discordance. In this case, both the validator and I noted a theme present across a number of segments, but labeled it differently.

In document Clique : perceptually based, task oriented auditory display for GUI applications (Page 112-115)