• No results found

5.2.1 Stimulus attenuated according to contrast sensitivity inhomogeneity

a) Stimulus image b) Attenuation surface c) Attenuated stimulus

Figure 5.1: Multiplication of the “Battenberg” stimulus in panel a) (see Section 3.4.6) by the witch’s hat attenuation surface in panel b) (see Section 4.9.4) gives the attenuated stimulus in panel c). The contrast of the image decreases from the centre outward at the same rate as the decline in contrast sensitivity in human vision.

In all of the models featured here, the contrast of the stimulus image is first adjusted to reflect the variation in sensitivity across the visual field (see Figure 5.1). A witch’s hat attenuation surface derived from empirical measurements (see Section 4.9.4) is generated for the appro- priate observer, fixation location, and spatial frequency. This is a 2D matrix the same size as the stimulus image with the value at each location being the gain of the input stage of the vi- sual system at that location relative to that at fixation (giving a gain of unity at fixation, and below that elsewhere). This attenuation surface (A) is multiplied by the stimulus image (S) to provide the attenuated image (Satt), which is used as the input for the next stage of the model

Satt[x, y] = S[x, y]· A[x, y]. (5.1)

5.2.2 Spatial filtering by log-Gabor patches

The next stage is the spatial filtering of the image in analogy to the process performed by the visual system up to the simple cell stage in V1 (see Section 2.3.1). Where stimuli were pre- sented with a single target orientation and spatial frequency (as is the case for all of the con- trast detection studies presented here) this is modelled as occurring within a single orientation- and frequency-tuned channel (see Section 2.3.2). This is implemented by convolving the stim- ulus images with a log-Gabor patch.

The bandwidths of the log-Gabors used here (±25◦orientation, 1.6 octaves spatial frequency)

a) Sin log-Gabor filtered b) Cos log-Gabor filtered c) Complex

Figure 5.2: The output of filtering the attenuated stimulus (Figure 5.1c) with a sine- phase log-Gabor (a) and a cosine-phase log-Gabor (b). The simulated complex cell re- sponse calculated by taking the Pythagorean sum of the sine and cosine responses is shown in c). Inset in a) and b) are the sine-phase (Lsin) and cosine-phase (Lcos) log-Gabors

used to perform the filtering.

bandwidths found in simple cells (Meese & Summers, 2007; Meese, 2010). The output of this stage is an image where the intensity of each pixel reflects the activity of a model simple cell (also referred to as a “detector”) at that location (see Figure 5.2). The spatial frequency, orien- tation, and phase tuning is defined by the the properties of the log-Gabor filter element. The responses of the sin-phase log-Gabor filter elements (Ssin) are calculated by convolving

the attenuated stimulus image (Satt) with a sin-phase log-Gabor (Lsin)

Ssin= Satt∗ Lsin, (5.2)

the cosine-phase responses (Scos) are calculated using a cosine-phase log-Gabor (Lcos)

Scos= Satt∗ Lcos, (5.3)

and the complex response (Scomplex) is calculated from the Pythagorean sum of the sine and

cosine responses

Scomplex[x, y] =pSsin[x, y]2+ S

cos[x, y]2. (5.4)

For the summation modelling presented here there is little to no difference in predictions made by models with sine phase, cosine phase, or complex responses from the filtering stage. To perform the convolution, the Fourier transform of the attenuated image is multiplied by the Fourier transform of a log-Gabor patch. The output of this process is then converted back to the spatial domain to give the filtered image. Summation within the simulated recep- tive fields of this stage bypasses any subsequent nonlinearities. This within-filter summation causes models to behave as if they were linear for stimuli which are smaller than the filter ele- ments, and increases the predicted summation for stimuli of a similar size to the filter element.

5.2.3 Rectification and nonlinear transduction of filter outputs

a) Rectified filter output b) Squared filter output

Figure 5.3: The rectified output from filtering with a sin-phase log-Gabor patch (Fig- ure 5.2) is shown in panel a). The image in panel b) shows the effect of squaring the value at each pixel (representing the nonlinear transduction of filter outputs).

The filter outputs are rectified (Srect) by taking the absolute value of each pixel (see Figure 5.3a)

Srect[x, y] =|Ssin[x, y]|. (5.5) This represents the unsigned magnitude of the filter outputs.

The pixel values representing the filter outputs may then undergo nonlinear transduction (Strans)

by raising them to a power m (see Figure 5.3b)

Strans[x, y] = (Srect[x, y])m. (5.6)

5.2.4 Pixelwise additive Gaussian noise

The output of each filter element is perturbed by internal noise. This is modelled as indepen- dent additive Gaussian noise with a mean of zero and a constant variance. The assumption of Gaussian noise is made in accordance with the Central Limit Theorem (Peterson et al., 1954; Tyler & Chen, 2000). In stochastic models (see Figure 5.4a), this is simulated and added to the transduced stimulus image to give the noisy filter outputs (Snoisy)

Snoisy[x, y] = Strans[x, y] + N (µ, σ2), (5.7)

where N(µ, σ2)is a sample from Gaussian noise with mean µ and standard deviation σ. For

analytic models (see Figure 5.4b), the noise is represented in the calculations by a separate matrix (G) containing the standard deviations of the noise for the output of each filter element

σ1,1 ... σ2,1 σ3,1 σ1,2 σ1,3 σ2,2 σ3,3 σ1,n σn,1 σn,n ... ... ... ... ...

a) Stochastic noise b) Analytic representation of noise

Figure 5.4: Noise is represented differently in the two types of model. In stochastic mod- els (a) independent Gaussian noise is added to the pixel value at each location in the fil- tered stimulus image. In analytic models (b) the noise is represented as a separate matrix containing the standard deviations of the noise for each pixel in the filtered stimulus.

The only model architectures considered here are those where the dominant source of noise comes after the nonlinear transduction stage. According to Birdsall’s theorem, dominant noise placed before transduction linearises the transducer. This makes the behaviour of such a sys- tem equivalent to that of a system with a linear transducer (though see Appendix A).

5.2.5 Template matching

a) No template b) Ideal template c) Stimulus extent

Figure 5.5: Different template strategies are shown here, demonstrated using the stochastic model. Panel a) shows the noisy stimulus image with no template applied. Panel b) shows the image multiplied by a template which is matched to the stimulus ex- actly (an “ideal” template). Panel c) shows the image multiplied by a template that is matched to the stimulus extent (without the “Battenberg” modulation). In both b) and c) the weighting of the templates declines with eccentricity in proportion to the expected signal to noise ratio resulting from the attenuation surface.

An observer behaving with knowledge of the expected stimulus could choose to improve the signal-to-noise ratio at the decision stage by combining a weighted input from each detector, according to a template. The “ideal” template would be matched exactly to the stimulus (see

Figure 5.5b). The output of this stage (Stemp) is obtained by multiplying the noisy stimulus by

the attenuated, filtered, and transduced stimulus (Strans) in the stochastic model

Stemp[x, y] = Snoisy[x, y]· Strans[x, y], (5.9) whilst in the analytic model the template is applied both to the signal matrix and to the stan- dard deviations of the pixelwise noise

Stemp[x, y] = Strans[x, y]· Strans[x, y], (5.10)

Gtemp[x, y] = G[x, y]· Strans[x, y]. (5.11)

Models with a template matched to the stimulus envelope make similar predictions to those made by a model with the ideal matched template. These matched and envelope templates will both include the attenuation introduced by the visual field inhomogeneity in contrast sensitiv- ity. Input from more eccentric locations is weighted to have less of an effect on the decision than the input from the fovea, in proportion to the expected signal to noise ratio. Templates which do not feature this variation in weighting over their surface due to the attenuation are referred to as “flat” in this thesis. Where the template is matched to the extent of the stimu- lus envelope (e.g. is a continuous square or circle, even when the stimulus contains holes), the noise remains in the areas within the stimulus that do not feature signal (Figure 5.5c).

5.2.6 Spatial summation and calculation of the detection threshold

The signals from each location in the stimulus image are combined, either through a linear sum of the pixel values or a max operation over the image. For stochastic models these val- ues are then provided to the decision mechanism. In a simulation of a 2IFC experiment the output from one interval will then be compared against the output from another interval in order to choose the one most likely to contain the stimulus. Repeating this many times with signals of different strengths allows an experiment to be simulated, the results of which give the model prediction (threshold contrast is found by fitting psychometric functions to the sim- ulated data). For analytic models the predictions are derived by calculating the signal-to-noise ratio at the decision stage (see Section 5.3).