Down-sampling - Introduction to Digital Signal Processing

Down-sampling is the opposite of up-sampling and is much like the process of digitizing an analog signal (continuous time/amplitude signal). It can be likened to sampling as the ensuing signal after down-sampling is a limited representation of the original signal. In other words, the number of total samples is reduced when down-sampled similar to digitizing analog signals.

Fig. 6.2. 2-fold down-sampling.

The formal deﬁnition of M -fold down-sampling is shown in Eq. (6.3) where M is the integer down-sampling amount for input x and n the sample index.

y[n] = x[n · M ] (6.3)

For example, when M = 2 we get the results shown in Fig. 6.2.

As we may have guessed from our introduction of aliasing and sampling in Chap. 1, since down-sampling is essentially a process of limiting a signal (similar to sampling an analog signal), aliasing becomes a problem. If the input x is not band-limited properly before down-sampling, aliasing will occur. We will discuss what sort of anti-aliasing filters we will need by observing what happens to a non-band-limited down-sampling process in Chap. 8.

In general, it is also worthwhile to note that up-sampling and down- sampling can be useful in musical applications as modulating the sample rate will lead to interesting musical results. If the sampling rate of the sound output device is kept constant while the sampling rate of the signal is modulated, pitch/frequency characteristics as shown in Table 6.1 result. Although pitch shifts down or up according to L and M (while fsis kept

the same for the audio output device), due to the increases (L) and decreases (M ) of total number of samples, the sound characteristic or its timbre is

September 25, 2009 13:32 spi-b673 9in x 6in b673-ch02

68 Introduction to Digital Signal Processing

Table 6.1. Up-sampling and down-sampling eﬀects on audio signals.

Total number of samples Pitch/frequency Up-sampling IncreasesL-fold Shifts downL-fold Down-sampling Decreases M-fold Shifts upM-fold

altered as well. It is possible to use down-sampling to mimic pitch-shifting pitches upwards, but the so-called “munchkin effect” becomes a hindrance unless of course we want the munchkin effect itself (try the MATLAB code below on a human voice sample to hear the munchkin effect when using down-sampling). Conversely, up-sampling can also be used to expand signals in time — making a signal twice as long for example. Here also a similar problem arises as the pitch will change according to the addition of samples to the original signal.

%Load some audio sample into MATLAB [x, fs] = wavread(‘myInstrumentSound.wav’); sound(x, fs)

% try it with up-sampling, for example sound(upsample (x, 2), fs)

%or down-sampling, for example sound (downsample(x, 2), fs)

Code Example 6.1.Up-sampling4 and down-sampling example

We will learn in Chap. 9 how to change pitch without changing the duration of a signal. In the next section we will show a simple method where we can time-stretch (lengthen) or time-compress (shorten) a signal using the overlap-and-add (OLA) algorithm without changing the pitch characteristics.

7 Overlap and Add (OLA)

When trying to make a signal longer or shorter through up-sampling or down-sampling, we observed that change in pitch is an inevitable byproduct which is probably more often than not undesirable. One solution to ﬁxing

this problem is achieved by a method called overlap and add (OLA). When using the OLA algorithm, it is possible to time-stretch or time-compress a signal without changing the pitch of the signal — for time-stretching we can make a given signal longer without “additional” signal information. The method is surprisingly simple in its core concept and is outlined in Fig. 7.1 and further described below.

Fig. 7.1. Hann window (top) and OLA (bottom).

To learn the mechanics of this method, let’s take the ﬁrst 1800 samples of the piano sound we used before as an example. The top of Fig. 7.1 shows the Hann window we used to window a portion of the piano signal. In previous windowing examples, we observed that windows were applied in a similar fashion, whereby the next window’s start sample index would begin just after the end of the previous window’s right edge. In the bottom of Fig. 7.1, we notice that we have 6 windows not in a sequential non- overlapping manner as before, but in an overlapping conﬁguration. To be more precise, in this example we use a 50% window overlap equivalent to 256 samples or 5 ms with a Hann window size of 512 samples equivalent to 110 ms. The start time of each new window is not at window size intervals as we have previously seen, but instead it is equal to the integer multiple of the window size minus the overlap percentage referred to as the hop size. The hop size in our example is 256 samples. With 90% overlap we would

September 25, 2009 13:32 spi-b673 9in x 6in b673-ch02

70 Introduction to Digital Signal Processing

Fig. 7.2. Windowed portion of piano sample.

have a hop size of 180 samples. Figure 7.2 shows the windowed portions of the waveform in Fig. 7.1 which are all 512 samples in duration as specified. Now, if we use the 6 windowed portions of the signal and sequentially put them back together through addition with different overlap percentage, the result will be either a longer signal or a shorter signal depending on how much overlap we choose at the synthesis stage. Note that we could also have used a rectangular window, but opted for the Hann window as it has smooth edges which will help blend (cross-fade) adjacent windowed signals during the overlap and add procedure. The first phase of OLA where we window the signal is referred to as the analysis part and the second stage where we actually overlap-and-add the windowed signals is referred to as the synthesis part. The algorithm encompassing both the analysis and synthesis parts is referred to as overlap-and-add.

If we were to have an analysis overlap of 50% (256 samples) and a synthesis overlap of 75% (384 samples) the resulting signal after OLA will be 25% shorter (total number of samples 1280). If we were to use a synthesis overlap of 25% instead for example, we would have a time-stretched signal that would be 25% longer (total number of samples now 2250). What is interesting is that in either case, we will not experience any pitch alterations as there is no up-sampling or down-sampling involved in the OLA algorithm.

OLA works due to aspects of psychoacoustics and concepts related to gestalt theory that deal with perceptual grouping tendencies. You may remember from Chap. 1 that we perceive time events diﬀerently, depending on how fast or slow these events occur. When time events such as pulse trains are set at 1 Hz, we will actually be able to count them and possibly perceive them as rhythm if diﬀerent accents are used for each pulse. As the frequency is increased we lose the ability to count these pulses and the perception goes from rhythm to sound color or timbre up until around 20 Hz or so. After 20 Hz the pulse train, which really has not changed in structure other than an increase in the number of pulses per second, gradually modulates and is perceived as pitch. As the frequency is further increased towards the upper pitch limits of our hearing system, the perception of pitch now changes to the perception of a sense of high frequency. The OLA algorithm thus exploits psychoacoustic characteristics and ambiguity of perception in the 20 Hz to 30 Hz range, equivalent to 30 ms to 50 ms, by applying small window sizes as well as short hop sizes to render the perception of audio signals becoming shorter or longer with little or (ideally) no change in pitch and timbre.

In document Introduction to Digital Signal Processing - Computer Musically speaking by Hong Park.pdf (Page 87-92)