Listeners.
Twenty-four paid listeners (15 female) were recruited. Listener age range was between 19 and 35 (M=23.50, SD=4.9). All listeners reported normal hearing, normal or corrected eyesight, a formal education to undergraduate level or above, a good understanding of the English language, and provided written informed consent. Twenty-one of the listeners self- reported as right handed. Listeners were assigned to experimental groups in a pseudorandom manner aside from the gender split where 5 females were in each group. Each group
completed the same task but was differentiated on the number of training days undertaken (2, 4 or 10).
Materials.
Stimuli were designed using The vOICe (Meijer, 1992) and Adobe Audition 3 - see ‘Stimulus Design’ below. The script was run in Matlab and Psychtoolbox (Brainard, 1997; Kleiner, Brainard, & Pelli, 2007; Pelli, 1997) on a Windows PC with a Creative Labs Soundblaster Titanium ASIO soundcard to ensure low latency. All auditory signals were transmitted to the listener through Sennheiser HD555 over ear headphones. The blindfold used was the
Mindfold Inc. (Tucson, AZ).
Stimulus Design.
A plain white triangle (apex upwards) on a black background was sonified using The
vOICe’s image sonification feature. Prior to sonification, the device scan rate was set at ‘x8’
to reduce the temporal length of the stimulus to 125 milliseconds. This was then trimmed to remove the soundscape representing the black areas at each side of the triangle base resulting in an auditory stimulus of 90ms. Adobe Audition 3 was used to apply a 10ms cosine ramp
fade in and out to the stimulus onset and offset. This was done to avoid any distortions, or spectral splatter to the start and end of the soundscape, thus providing a clear signal.
Frequency was measured as a range as the experimental aim was to create a complex signal comprised of a range of frequencies (each of the 4096 pixels has its own frequency,
amplitude and temporal feature). For the standard stimulus the fundamental frequency was centered at 1 kilohertz (kHz), temporal duration of 90ms and amplitude of -85dB. The alternate ‘test’ stimuli were created by manipulating the standard stimulus in either Adobe Audition 3 (frequency) or The vOICe (interval). The frequency range was increased using a 0.60 ratio that raised the frequency range to one centered at 4kHz whilst retaining the 90ms temporal duration. The alternate duration was generated using the same visual stimulus but sonified using The vOICe at a 250ms scan rate. After the trim and ramp were applied the resultant stimuli was at 1khz frequency with a temporal duration of 220ms. For the stereo condition the frequency and duration values were the same as the standard (90ms, 1kHz) but the signal was conveyed through both headphones binaurally.
Figure 1.1. shows how The vOICe sonifies visual images in real time converting visual features (brightness and spatial position) to auditory features (amplitude, frequency, time and stereo panning). Each of the 4096 pixels in the recorded greyscale image is subjected to 3 conversion principles. Visual brightness is coded to auditory amplitude with brighter pixels eliciting louder tones. Spatial position uses two principles to code for vertical and horizontal localisation. On the y-axis pixel position corresponds to frequency with higher frequencies representing pixels higher up in the recorded image. A one second left-to-right time scan across the image provides a temporal cue to position on the x-axis with pixels to the left of the image being heard earlier in the time scan. If used in stereo mode a left-to-right pan across the stereo field provides, in conjunction with the time scan, a more accurate and complex coding feature for horizontal localisation with left orientated pixels being heard in
the left headphone. To give the final ‘soundscape’ all pixel sounds in a column are played concurrently (64 pure tones imposed over each other) with these 174 columns, or raster lines, then played sequentially over the duration of the time scan. The resulting ‘soundscape’ is a complex signal comprising of a large number of frequencies and amplitudes, played back to the user eithermonaurally or binaurally via headphones.
Figure 2.1: Conversion of image to sound using The vOICe algorithm. White pixels in the image are
represented by a sound with black pixels silent. The elevation of each pixel is coded to frequency with pixels higher in the image having a higher frequency sine wave. All pixels in a vertical raster line are played simultaneously with a 1 second left-to-right horizontal scan across the image resulting in the soundscape for the image. (Image created by author)
Procedure
Listeners were assigned a work station, the procedure explained to them both verbally and via an information sheet, and written consent obtained. The blindfold and headphones were then put on and each listener guided to the ‘1’ and ‘2’ keys on the number pad on the PC
keyboard. Listeners were then instructed to press the spacebar twice to start the first block of 60 trials. This double press of the spacebar was used to start all blocks in the condition (9 on training days and 5 for test days).
Figure 2.2. displays a sample trial for the standard condition. For each trial the listeners were presented with a pair of tones, separated by 970ms, in the left headphone. One of these tones was the ‘reference’ tone (t) which was temporally consistent throughout all trials in the particular condition. The comparison’ tone (t + Δt) varied in duration dependent on previous answers and the 3 up/1 down psychophysical staircase procedure.
Three correct consecutive responses reduced the Δt by 1 unit whilst one incorrect response increased the Δt by one unit. The trial where the direction changed – from decrease to
increase or vice versa – was classed as a reversal. For the first three reversals the unit change was 5ms with a 1ms change for subsequent reversals in each block.
While Figure 2.2. illustrates a trial in the standard condition this could also be represented for the other conditions. For example, in the alternate duration condition the reference cut-off point is at the same point on the downslope of the triangle because the signal duration was set using a slower scan speed. The triangle retains its proportionality to the background. The alternate frequency condition kept the same scan speed as the standard and with the temporal cuts being made to the auditory waveform post-sonification, only the spectrograph would differ (the triangle image is not showing specific frequency, just duration).
Listeners were required to indicate using the number keys whether the reference tone was presented first or second in the pairing. After the keystroke was made, feedback was provided by a ‘pure tone’ in the right headphone for an incorrect answer followed by the onset of the next trial. Correct responses resulted in the next trial starting with no prior auditory feedback.
After a 60 trial block was completed, the next block was initiated by the listener by a double depression of the space bar. This allowed the listener to take a short break at their own discretion. ‘Official’ breaks were also offered between the 5th and 6th blocks on a training
day. During this intermission, the listeners were allowed to remove the headphones but not the blindfold. On the test days short breaks were taken between the conditions whilst the next conditions script was loaded into Matlab and an official break was offered after the first two conditions (10 blocks). The average time duration per block was four minutes.
The pre-test consisted of 5 blocks of each of the 4 conditions; standard, alternate duration, alternate frequency, stereo (1200 trials in total). The presentation of the conditions was varied amongst groups but was kept consistent within group concerning the pre- and post-tests. The standard condition was presented first for all groups in the pre-test phase.
Calculation of thresholds.
Thresholds were obtained by first removing the first 3 or 4 reversals in each block to ensure an even number of reversals. If this resulted in there being less than 6 reversals in the block then the block was disregarded. For the accepted blocks the Δt for each of the reversals was noted and averaged across the block to give a block threshold. On the proviso that there were at least 3 (pre and post-test) or 6 (training) thresholds mean scores were calculated for
Figure 2.2: Representation of a sample trial. Listeners are presented with a reference soundscape followed
by a 970ms inter-stimulus gap. They are then presented with a comparison tone and required to indicate whether the reference tone was presented 1st or 2nd. In the standard condition the reference tone is always of the same duration with reference and comparison tones presented in a random order. Feedback is given after response. The duration of the reference tone is stable with the comparison tone adapted on a 3 up/1 down staircase procedure. The left hand column of the figure shows the image that was sonified , with the right hand column showing the spectrogram for the resultant soundscape.
individual listeners and experimental groups for each session. Weber fractions were computed by dividing the total Δt by t and then entered for analysis.