Model stages - Pattern integration in the normal and abnormal human visual system

5.2.1 Stimulus attenuated according to contrast sensitivity inhomogeneity

a) Stimulus image b) Attenuation surface c) Attenuated stimulus

Figure 5.1: Multiplication of the “Battenberg” stimulus in panel a) (see Section 3.4.6) by the witch’s hat attenuation surface in panel b) (see Section 4.9.4) gives the attenuated stimulus in panel c). The contrast of the image decreases from the centre outward at the same rate as the decline in contrast sensitivity in human vision.

In all of the models featured here, the contrast of the stimulus image is first adjusted to reflect the variation in sensitivity across the visual field (see Figure 5.1). A witch’s hat attenuation surface derived from empirical measurements (see Section 4.9.4) is generated for the appro- priate observer, fixation location, and spatial frequency. This is a 2D matrix the same size as the stimulus image with the value at each location being the gain of the input stage of the visual system at that location relative to that at fixation (giving a gain of unity at fixation, and below that elsewhere). This attenuation surface (A) is multiplied by the stimulus image (S) to provide the attenuated image (Satt), which is used as the input for the next stage of the model

S_att[x, y] = S[x, y]· A[x, y]. (5.1)

5.2.2 Spatial filtering by log-Gabor patches

The next stage is the spatial filtering of the image in analogy to the process performed by the visual system up to the simple cell stage in V1 (see Section 2.3.1). Where stimuli were presented with a single target orientation and spatial frequency (as is the case for all of the contrast detection studies presented here) this is modelled as occurring within a single orientation- and frequency-tuned channel (see Section 2.3.2). This is implemented by convolving the stimulus images with a log-Gabor patch.

The bandwidths of the log-Gabors used here (±25◦_{orientation, 1.6 octaves spatial frequency)}

a) Sin log-Gabor filtered b) Cos log-Gabor filtered c) Complex

Figure 5.2: The output of filtering the attenuated stimulus (Figure 5.1c) with a sine- phase log-Gabor (a) and a cosine-phase log-Gabor (b). The simulated complex cell response calculated by taking the Pythagorean sum of the sine and cosine responses is shown in c). Inset in a) and b) are the sine-phase (Lsin) and cosine-phase (Lcos) log-Gabors

used to perform the filtering.

bandwidths found in simple cells (Meese & Summers, 2007; Meese, 2010). The output of this stage is an image where the intensity of each pixel reflects the activity of a model simple cell (also referred to as a “detector”) at that location (see Figure 5.2). The spatial frequency, orientation, and phase tuning is defined by the the properties of the log-Gabor filter element. The responses of the sin-phase log-Gabor filter elements (Ssin) are calculated by convolving

the attenuated stimulus image (Satt) with a sin-phase log-Gabor (Lsin)

S_sin= S_att_{∗ L}_sin, (5.2)

the cosine-phase responses (Scos) are calculated using a cosine-phase log-Gabor (Lcos)

S_cos= S_att_{∗ L}_cos, (5.3)

and the complex response (Scomplex) is calculated from the Pythagorean sum of the sine and

cosine responses

S_complex[x, y] =pS_sin[x, y]2_{+ S}

cos[x, y]2. (5.4)

For the summation modelling presented here there is little to no difference in predictions made by models with sine phase, cosine phase, or complex responses from the filtering stage. To perform the convolution, the Fourier transform of the attenuated image is multiplied by the Fourier transform of a log-Gabor patch. The output of this process is then converted back to the spatial domain to give the filtered image. Summation within the simulated recep- tive fields of this stage bypasses any subsequent nonlinearities. This within-filter summation causes models to behave as if they were linear for stimuli which are smaller than the filter elements, and increases the predicted summation for stimuli of a similar size to the filter element.

5.2.3 Rectification and nonlinear transduction of filter outputs

a) Rectified filter output b) Squared filter output

Figure 5.3: The rectified output from filtering with a sin-phase log-Gabor patch (Fig- ure 5.2) is shown in panel a). The image in panel b) shows the effect of squaring the value at each pixel (representing the nonlinear transduction of filter outputs).

The filter outputs are rectified (Srect) by taking the absolute value of each pixel (see Figure 5.3a)

S_rect[x, y] =_|S_sin[x, y]_|. (5.5) This represents the unsigned magnitude of the filter outputs.

The pixel values representing the filter outputs may then undergo nonlinear transduction (Strans)

by raising them to a power m (see Figure 5.3b)

S_trans[x, y] = (S_rect[x, y])m. (5.6)

5.2.4 Pixelwise additive Gaussian noise

The output of each filter element is perturbed by internal noise. This is modelled as independent additive Gaussian noise with a mean of zero and a constant variance. The assumption of Gaussian noise is made in accordance with the Central Limit Theorem (Peterson et al., 1954; Tyler & Chen, 2000). In stochastic models (see Figure 5.4a), this is simulated and added to the transduced stimulus image to give the noisy filter outputs (Snoisy)

S_noisy[x, y] = S_trans[x, y] + N (µ, σ2), (5.7)

where N(µ, σ2₎_{is a sample from Gaussian noise with mean µ and standard deviation σ. For}

analytic models (see Figure 5.4b), the noise is represented in the calculations by a separate matrix (G) containing the standard deviations of the noise for the output of each filter element

σ1,1 ... σ2,1 σ3,1 σ1,2 σ1,3 σ2,2 σ3,3 σ1,n σn,1 σn,n ... ... ... ... ...

a) Stochastic noise b) Analytic representation of noise

Figure 5.4: Noise is represented differently in the two types of model. In stochastic models (a) independent Gaussian noise is added to the pixel value at each location in the filtered stimulus image. In analytic models (b) the noise is represented as a separate matrix containing the standard deviations of the noise for each pixel in the filtered stimulus.

The only model architectures considered here are those where the dominant source of noise comes after the nonlinear transduction stage. According to Birdsall’s theorem, dominant noise placed before transduction linearises the transducer. This makes the behaviour of such a sys- tem equivalent to that of a system with a linear transducer (though see Appendix A).

5.2.5 Template matching

a) No template b) Ideal template c) Stimulus extent

Figure 5.5: Different template strategies are shown here, demonstrated using the stochastic model. Panel a) shows the noisy stimulus image with no template applied. Panel b) shows the image multiplied by a template which is matched to the stimulus exactly (an “ideal” template). Panel c) shows the image multiplied by a template that is matched to the stimulus extent (without the “Battenberg” modulation). In both b) and c) the weighting of the templates declines with eccentricity in proportion to the expected signal to noise ratio resulting from the attenuation surface.

An observer behaving with knowledge of the expected stimulus could choose to improve the signal-to-noise ratio at the decision stage by combining a weighted input from each detector, according to a template. The “ideal” template would be matched exactly to the stimulus (see

Figure 5.5b). The output of this stage (Stemp) is obtained by multiplying the noisy stimulus by

the attenuated, filtered, and transduced stimulus (Strans) in the stochastic model

S_temp[x, y] = S_noisy[x, y]_{· S}_trans[x, y], (5.9) whilst in the analytic model the template is applied both to the signal matrix and to the standard deviations of the pixelwise noise

S_temp[x, y] = S_trans[x, y]· Strans[x, y], (5.10)

G_temp[x, y] = G[x, y]_{· S}_trans[x, y]. (5.11)

Models with a template matched to the stimulus envelope make similar predictions to those made by a model with the ideal matched template. These matched and envelope templates will both include the attenuation introduced by the visual field inhomogeneity in contrast sensitivity. Input from more eccentric locations is weighted to have less of an effect on the decision than the input from the fovea, in proportion to the expected signal to noise ratio. Templates which do not feature this variation in weighting over their surface due to the attenuation are referred to as “flat” in this thesis. Where the template is matched to the extent of the stimulus envelope (e.g. is a continuous square or circle, even when the stimulus contains holes), the noise remains in the areas within the stimulus that do not feature signal (Figure 5.5c).

5.2.6 Spatial summation and calculation of the detection threshold

The signals from each location in the stimulus image are combined, either through a linear sum of the pixel values or a max operation over the image. For stochastic models these values are then provided to the decision mechanism. In a simulation of a 2IFC experiment the output from one interval will then be compared against the output from another interval in order to choose the one most likely to contain the stimulus. Repeating this many times with signals of different strengths allows an experiment to be simulated, the results of which give the model prediction (threshold contrast is found by fitting psychometric functions to the simulated data). For analytic models the predictions are derived by calculating the signal-to-noise ratio at the decision stage (see Section 5.3).

In document Pattern integration in the normal and abnormal human visual system (Page 95-100)