CHAPTER 1: General introduction
1.4 Auditory filter shape
The width of the AF is the crucial feature when looking at frequency selectivity, and this is dependent on its shape. One benefit of describing filter shape, beyond obtaining an insight into the workings of the auditory system, is that fewer detection thresholds need be measured to obtain a bandwidth estimate. Also one can get an impression of how reliable the data are by how closely they match the stereotypical shape.
As mentioned previously, in his 1940 paper Fletcher simplified AF shape to a rectangle, but also found that regardless of shape, the bandwidth of the AF increased with signal frequency (SF) (Fletcher, 1953). He used a full spectrum broadband masker to mask pure tone signals of different frequencies. As the frequency of the signal increased so did the detection threshold. The masker was the same throughout so the difference must be caused by an increase in masking energy passing through the AF, therefore larger AFs. This is at least partly due to the decrease in cochlear gain with increasing frequency, described in 1.2 The basilar membrane and its non-linearities.
Figure 1.4
Examples of a roex(p,r) and a gammatone function. Gammatone parameters: a = 1, n = 4, b = 700,
f = 5 kHz. The order (n) determines the steepness of the sides, the bandwidth (b) determines how
wide it is. Roex parameters: p = 30, r = 5e-4, CF = 10 kHz. p determines how steep the slopes are, r determines the position of the floor. The slopes appear straight due to the logarithmic scale on the 'Gain' axis.
16
Schafer et al. (1950) first investigated the shape of the filter using band-widening and showed that the AF was not rectangular since there was no discontinuity in the detection threshold function. They dubbed the new AF shape a 'universal-resonance filter' with gradually sloping sides. This was corroborated by a notched noise study (Webster et al., 1952) and later a pure tone masking study (Small Jr, 1959). Swets et al. (1962) compared a number of potential filter shapes, namely rectangular, universal- resonance and Gaussian filter using band-widening, and showed that the critical bandwidth depended greatly on the assumed filter shape. Investigation into the precise AF shape was therefore critical. Patterson (1974) and Margolis and Small (1975) used low pass and high pass noise to investigate the shape, but detailed investigation didn't really take off until notched noise became established. Patterson (1976) used notched noise and showed that the Gaussian function was a good approximation to the AF close to the centre of the filter but that the tails dropped off too fast. Patterson and Nimmo Smith (1980) showed that a rounded exponential function would make a much better fit, and in Patterson et al. (1982) the roex(p,r) function was first suggested as the AF shape. This function is
e (1.2)
where W is the filter weighting function, g is the deviation from the centre of the filter, p is the parameter defining the slopes of the function, and r defines the floor of the filter (see Figure 1.4). At the same time the symmetry of the filter was investigated by using asymmetrically spaced notch widths. It was found that the AF was slightly broader on the lower edge than the upper. This lead to a variant of the roex(p,r) with two p parameters; one defining the slope of each tail, allowing for an asymmetric AF. However, for moderate noise levels AF was found to be only very slightly asymmetric, on a linear frequency scale, such that an approximation of symmetry is entirely reasonable (Patterson, 1974; Patterson and Nimmo Smith, 1980; Patterson, 1986). More refinement led to the roex(p,w,t) and roex(p,w,p,t) functions in which the upper and lower edges, and lower edge only, respectively, of the roex function have two slope variables (Glasberg et al., 1984b; Glasberg and Moore, 2000). This creates 'skirts' allowing shallower slopes further away from the centre of the filter, and steeper slopes more centrally, although it has been argued that these additional filter functions are 'too flexible' to be of much use (Rosen et al., 1998).
These skirts are an influence of the theory of an active and a passive component to cochlear mechanics, described above in 1.2 The basilar membrane and its non-
linearities. In this situation the passive component can be thought of as a broad, low gain filter with shallow slopes, and the active component as a narrow, high gain filter with steep slopes. The active component would dominate at frequencies close to the signal, and the passive component for frequencies further away. This would lead to an AF with steep slopes near the centre, and shallower slopes further away. In Glasberg et al. (1999) they did away with the AF shape completely and attempted to directly fit a two filter model to notched noise data; such a model is known as a 'cascade filter' model. This, however, did not fit such data as well as a roex(p,r) based fitting procedure (Glasberg and Moore, 2000).
The roex is not the only modern function to describe AF shape. In Holdsworth et al. (1988) it was suggested that a gammatone function could be used to simulate AF shape. Such a function is simply the product of a sinusoid and a gamma distribution, and has the following formula in the time domain,
e cos (1.3)
where f is the centre frequency (in Hz), is the phase of the carrier (in radians), a is the amplitude, b is the filter bandwidth (in Hz), n is the filter order, and t is time (in seconds) (see Figure 1.4). This function has the advantage of having fewer parameters than the roex and has a fixed slight asymmetry determined by its order. Also it has a simple implementation in the time domain meaning that it is perfectly suited for use in filter bank models which require quick and effective filtering of a signal in the time domain. The disadvantage, however, is that because of its fixed shape and fewer parameters (which is an advantage for finding stable fits to data), it does not always fit notched noise data as well as the roex. The roex, gammatone functions, cascade models and all their variants are all used today to describe frequency selectivity for different implementations since they each have their advantages and disadvantages. A nice description of these can be found in (Lyon et al., 2010).
Another universal feature of AF shape is that of level dependence. The highly non-linear nature of the active filter described above means that the level of the fixed stimulus has an effect on the final AF shape. It was noted in several studies calculating filter shape at various fixed levels of signal and masker, that as level increases, AF get broader (Patterson, 1986; Moore, 1987; Glasberg and Moore, 1990; Rosen et al., 1992; Rosen and Baker, 1994; Baker et al., 1998; Baker and Rosen, 2006). This broadening is a result of a shallower slope on the lower edge of the AF, so as level increases asymmetry also
18
change in efficiency of the detector following the filter (Lutfi and Patterson, 1984). Since AF is affected by level, it is very important, when comparing bandwidth measures across studies, to make sure they use the same psychophysical experiment, with the same stimulus parameters.
Developments in AF shape were driven mainly by improvements in the notched noise method, which was refined over many years, exploring the effects of stimulus duration, frequency, level, temporal arrangements and many more (Moore and Glasberg, 1981; Moore et al., 1984; Moore et al., 1987). This led to the classic (and much referenced) measurement of frequency selectivity in humans, using simultaneous masking and fixed masker level, presented in Glasberg and Moore (1990). Two key features of masking experiments (that do not only apply to notched noise masking) concern the temporal arrangement of the stimuli, and which stimulus should be kept fixed in level. These have been a matter for debate and are described in the next two sub-sections.