Dithering is a method where we add very small amounts (sub bit resolution — smaller than one bit) of pseudo-random noise (error ε) to the analog input signal before quantization and sampling as shown in Eq. (7.3). That may indeed sound unorthodox, as it seems counterintuitive to add noise to a signal as we want it to be as clean as possible.
x(t)dithered = x(t)original+ ε (7.3)
Have you ever listened to a CD recording on headphones and tried increasing the volume in the fadeout part of your favorite piece to see if there are any hidden messages or the like and all you got is this rather annoying buzzing, fuzzy, gritty sound? This is actually referred to as granulation noise. This is an artifact of sampling and quantization and is often times not so evident because a recording does not often play with the last 1 or 2 bits for a prolonged period of time. The listener thus may find this artifact not to be a problem at all, as it is not often heard due to the low signal level. However, to the composer or audio researcher who is analyzing, designing DSP algorithms, editing, or composing a musical work that deals with low amplitude signals, such problems are as common as finding portions of a sound where clipping occurs (hopefully not unintentionally on a commercial CD though). Dithering alleviates some of the problems associated with quantization described above. To understand how dithering works let’s look at the quantization process using rounding. Figure 7.2 (top) shows
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch01
34 Introduction to Digital Signal Processing
Fig. 7.2. Cosine wave after quantization (rounding) without dithering.
an exponentially decaying analog cosine wave and its quantized version with 4-bit resolution represented by the horizontal quantization grid lines. As we can see in the middle plot, the QSNR decreases rapidly (caused by small signal to error ratio), which means that the ratio between the strength of the original signal and quantization error becomes smaller and smaller. This is the source of some potential problems — at around 20 to 110 samples, the quantized waveform becomes patterned or regular and has some features of square waves (we shall see an extreme case of this shortly) that was not present in the original analog signal. This is problematic as a square wave introduces harmonic distortion (adding harmonics that were not present before). Furthermore, issues in aliasing may also occur due to the newly added harmonic distortion — as more additional odd harmonics are added to the signal, there is a possibility of the harmonics being greater in frequency than the Nyquist limit (more details on this in Chap. 8).
Suffice it to say for now, a repetitive pattern resembling a square wave emerges adding unwanted harmonics, especially odd harmonics not part of the original signal. An obvious fix for the above problems would of course be increasing the bit resolution which will give us the bit depth to represent the nuances in the low amplitude areas (low QSNR), thereby adding approximately another 6 dB to the dynamic range for every bit added. However, this method is highly uneconomical and does not really address the problem as we cannot know in advance what signal we are going to be dealing with, how loud it will be, and when it will change to a completely different signal. A solution for solving some of these problems is referred to as dithering. In essence, what dithering accomplishes is that it indirectly increases the bit resolution for low amplitude signals especially in the 1 bit range — it actually improves the 1 bit limit to something lower than that. Dithering is a very simple and clever way to help eliminate some of these artifacts by adding a small error signal to the original analog signal before sampling and quantization. In essence it is making the resulting signal noisier but as the added noise is so small it is for all practical purposes
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch01
36 Introduction to Digital Signal Processing
Fig. 7.4. Exponentially decaying cosine without (top) and with dithering (bottom).
negligible and inaudible. When this modified signal is quantized, it will not produce the characteristics of harmonic distortion via a transformation of the patterned quasi-square waves to an un-patterned one. One of the main reason this works for humans is that we tend to mask out noise and perceive it as background (especially when it is low) as our basilar membrane (part of the cochlear in our ear) has an averaging property bringing out repetitive patterns, even if that pattern of interest is engulfed by a relatively high noise floor. Thus, with dithering, we are literally making a decision to swap out harmonic distortion for low amplitude noise to make the artifacts seem less noticeable.
Dithering is illustrated in Figs. 7.3 and 7.4. The top plot of Fig. 7.3 illustrates the extreme case where the analog cosine literally becomes a square wave introducing harmonic distortion to the digitized signal via harmonics that were not present in the original analog signal. The dithered version of the cosine on the other hand at the bottom plot of Fig. 7.3, when quantized, loses this repetitive pattern of the square wave. Finally, Fig. 7.4 shows an exponentially decaying cosine with and without dithering. Clearly the dithered quantized version does a better job eliminating the square wave patterns. The quantization scheme applied in the plots is also referred to
as pulse width modulation (PWM) which refers to the way the pulse width contracts and expands in accordance with the signal.
8 Musical Examples
There are a number of interesting musical examples that take advantage of some of the concepts we have discussed in this chapter. One such piece is Kontakte (1958 ∼ 1960) by composer Karlheinz Stockhausen. In this piece, Stockhausen explores many facets of acoustics (among other things) including “contacts” between different sound families, morphing between a select number of sounds, as well as exploring boundaries between pitch, frequency, and roughness by increasing the frequency of a low frequency pulse (a pulse you can count) to the point where the pulse goes through gradual changes of rhythm, roughness/timbre, and pitch. Kontakte is also one of the first multi-channel musical works employing a quadraphonic sound projection strategy, where the composer used a loudspeaker mounted on a rotating turntable to record onto four-channel tape via four separate microphones. With this configuration, Stockhausen captured the orbital characteristic of the sound source which could be experienced by the audience in a quadraphonic sound reinforcement setup. With the advent of the tape recorder (the precursor to the digital audio workstation), composer Pierre Schaeffer wrote a number of incredible variations solely via recorded sounds of a creaking door and a sigh called none other than Variations for a Door and a Sigh composed in 1963. In this piece, through magnetic tape manipulations such as time stretching/compressing (playing the tape machine slower or faster), extracting and focusing on inherent musicality of the sounds, such as exposing rhythmic, timbral, and melodic features; Schaeffer guides the listener to a sonic experience that is quite unlike any other. This piece is a great example of using found sounds in the style of music concr`ete. Some 20 years later, John Oswald coined the term plunderphonics and wrote an essay entitled Plunderphonics, or Audio Piracy as a Compositional Prerogative (Oswald 1986). He wrote a collection of pieces such as dab (materials from Michael Jackson’s song bad) utilizing existing recordings and thus touching on sensitive issues of copyright and intellectual property, subsequently leading CBS and Michael Jackson filing complaints to the Canadian Recording Industry Association. What’s interesting is not just the complex world of copyright (or copyleft!) and the world of litigation, but the fact that the excerpts from the original track are chopped up into very short segments, often no more than one or two seconds
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch01
38 Introduction to Digital Signal Processing
or even shorter passages. The composer then takes those short samples and sequences them, builds rhythms, patterns, and does much more, essentially making a “totally” new piece, while at the same time leaving an unmistakable sonic residue of the original track challenging the listener to negotiate between context, content, origin, creativity, and novelty.
Another great example in the area of sampling and computer music can be seen in Jon Appleton’s Newark Airport Rock (1969). The piece can be regarded as a sampling expedition by the composer who recorded interviews at the airport asking interviewees to comment on electronic music. In a way, the composer’s job in this sort of project revolves around organizing, juxtaposing, repeating, and perhaps most importantly selecting those sound excerpts that will produce a desired musical composition from the vast number of samples. The piece also resembles a short documentary and clearly is narrative in its musical construct. Computer musicians and computer music researchers often walk the fine line between the world of science and art. This is especially true when testing out newly designed synthesis algorithms or using signal processing concepts in ways they were not “meant” to be used. An example of such a piece is called The Machine Stops written by the author in 2000. This piece actually exploits and “misuses” some of the DSP concepts we covered in this chapter - aliasing, sampling, distortion, and the sine wave. In this piece, the artifacts of aliasing are used in a musical context as is distortion, sampling and audio rate (above 20 Hz and beyond) panning and other processes. The starting point of the composition was just a single wave which really is the focus of the piece — what can we do with one sine wave and some “malicious misuses” of a few basic DSP techniques? That’s the wonderful thing about musical composition — there is really no incorrect way of composing although whether the result is interesting, good, appealing, boring, or intriguing is a whole different matter altogether. From an engineering point of view, however, the results may be regarded as wrong and even devastating, but the composer has the prerogative and extra rope to break rules and standard engineering (and musical!) practices without the need to worry (too much) about the resulting errors, provided it does not hurt anyone!
References and Further Reading
Dodge, C., Jerse, T. A. 1985. Computer Music Synthesis: Synthesis, Composition and Performance, New York, Schirmer Books.
Fletcher H., W. A. Munson 1933. “Loudness, its Definition, Measurement and Calculation”, Journal of Acoustical Society of America, 5, 82–108
Oswald, J. 1986. “Plunderphonics, or Audio Piracy as a Compositional Prerogative”, Musicworks 34
Zwicker E., Fastl H. 1999. Psycho-acoustics, Facts and Models, Springer-Verlag Series in Information Sciences
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch02
Chapter 2
TIME-DOMAIN SIGNAL PROCESSING I
1 Introduction
In this chapter and the next chapter, we will introduce a number of important concepts and signal processing techniques that can be found in the time-domain. The term time-domain generally refers to topics (analysis, synthesis, signal modulation, etc.) that have to do with two-dimensional data types, amplitude and time being the two dimensions. The sine wave we have seen in Chap. 1 is a perfect example where the signal is represented in those two dimensions. The counterpart to time-domain is referred to as the frequency-domain which will be formally presented in Chap. 8. Frequency- domain concepts are maybe a bit more difficult to grasp in the beginning for some folks as the concept differs on a fundamental level. Time-domain concepts on the other hand probably will come to us more naturally as we are accustomed to hearing sounds and waveforms in the time-domain. We will thus spend a substantial amount of time in the time-domain and get familiar with various DSP concepts before proceeding to frequency-domain related studies. Topics covered in this chapter will include amplitude envelope computation, pitch detection and autocorrelation, overlap and add
concepts, as well as a select number of classic sound synthesis algorithms. As usual, the chapter will conclude with musical examples pertinent to discussed materials.
2 Amplitude Envelope and ADSR
The so-called amplitude envelope is an important yet straightforward concept to grasp as we probably have encountered it on numerous occasions by looking at audio waveforms. The amplitude envelope of a sound object refers to the amplitude contour or the general shape of the signal’s amplitude with respect to time. It can be somewhat regarded as a zoomed- out view of a waveform without the subtle details. Amplitude envelopes are especially important sonic features for musical instruments. In particular, it is common to talk about the ADSR which stands for A(ttack), D(ecay), S(ustain), and R(elease), dividing the envelope of a musical instrument tone into 4 basic areas as seen in Fig. 2.1.
Figure 2.2 shows a Steinway piano sampled at 44.1 kHz and 16 bits playing a C4 note. The top figure is the waveform of the piano sample, the middle plot the root mean square (RMS) amplitude envelope (the topic of next section), and bottom one the RMS with logarithmic amplitude plot. Notice that there is a pretty sharp attack, long sustain, and even longer release in this example. Figure 2.3 shows a snare drum struck with a stick (fortissimo) at the same bit depth and sampling frequency. In the snare drum example, notice how quickly the snare drum sound dies away — at 300 milliseconds the snare drum sound has already reached about−40 dB whereas the piano at 300 milliseconds is still at approximately
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch02
42 Introduction to Digital Signal Processing
Fig. 2.2. Steinway piano C4, waveform (top), linear envelope (middle), dB envelope (bottom).
Fig. 2.3. Single snare drum hit, waveform (top), linear envelope (middle), dB envelope (bottom).
−18 dB approaching −40 dB at around 4 seconds. One of the reasons the amplitude envelope and the ADSR structure are important is because by altering the ADSR parameters, the perception of the sound object changes accordingly — the degree of change depends on the type and characteristic of modulation of the envelope. Sometimes even a relatively small change in the amplitude envelope can radically alter the original sonic identity itself, in extreme cases rendering the modulated sound unrecognizable. Try playing a piano chord or single note in MATLAB after loading it without any alteration and then try playing it in reverse so that it starts playing the last sample first and first sample last (Code Example 2.1). Granted everything has been reversed and not just the envelope, but the perception of the sound object is drastically changed by a very simple process highlighted by the time-reversal of the amplitude envelope.
[x, fs] = wavread (‘nameOfPianoSample.wav’); % read wave file sound (x, fs)
disp (‘Type any key to continue’) pause
xReverse = flipud (x); %flip the array up -> down sound (xReverse, fs)
Code Example 2.1. Reversing time-domain audio
3 Wavetable Synthesis
Looking at the ADSR structure of a musical note, an interesting observation can be made. Instruments such as the piano, vibraphone, or the electric bass have certain general behavior in the ADSR regions, especially in the attack and steady-state regions. The attack portion of the waveform is generally most complex and chaotic albeit not fully random or “noisy.” The attack also generally embodies of a wealth of energy which gradually dissipates by the end of the tone as it loses energy. The steady-state part of a musical instrument sound on the other hand is usually more stable and displays characteristics of quasi-periodicity. Figure 3.1 shows a piano sample at different points in time where we can clearly see the waveform becoming smoother and more patterned towards the end of piano tone.
A type of time-domain synthesis method called wavetable synthesis exploits the aforementioned characterstics. This synthesis method stores
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch02
44 Introduction to Digital Signal Processing
Fig. 3.1. Piano sample at different parts of the overall signal.
only the attack portion of the actual musical instrument waveform and a short segment of the steady-state. Thus, the waveform that is stored in ROM (read only memory) includes only an abbreviated version of the entire signal — the entire attack section and a portion of the steady-state. One of the reasons for doing this is actually very practical — economics. One can save ROM space on your synthesizer allowing for more sounds to be stored and making the synthesizers more powerful and affordable! You will probably think that merely storing the attack and a short segment of the steady-state is too short for practical usage in compositions or performance situations which is indeed true. It is probably fine when playing sharp staccato notes that are shorter or equal to the saved waveform, but if the user wants to sustain it for a longer duration than what is stored in ROM, we will encounter a problem as we do not have enough of the waveform stored for desired playback. The solution to this problem is what wavetable synthesis essentially is.
The steady-state is, for the lack of a better word, steady and (quasi) periodic as seen in Fig. 3.2. This means that it is easy enough to model as it is clearly patterned. What happens in wavetable synthesis is that a portion of the steady-state is selected via loop start and loop end markers as illustrated in Fig. 3.2.
Once the end of the loop point is reached (after the attack and a small portion of the steady-state), the wavetable pointer will just keep looping
Fig. 3.2. Loop points in wavetable synthesis.
between the loop start and end point markers. Through this looping, the ear is tricked into believing that the waveform has not ended but is actually being sustained. If we furthermore slowly decrease the amplitude while looping (using an envelope), it is also possible to mimic the gradual release characteristic of the original waveform even though we are only using a fraction of the original sound. The total length of this particular piano sound in its entirety is 108, 102 samples long and if we were to use wavetable synthesis to synthesize the piano tone we would only use approximately 3500 samples or 3.2% of the original, thus saving quite a bit of memory! This simple idea of using the attack and a select portion of the steady-state waveforms combined with looping the quasi-periodic part is in a nutshell how wavetable synthesis works. However, putting an idea into practice is often more difficult than meets the eye as there are so many different types of signals with a plethora of waveform shapes. In wavetable synthesis, the trick is thus to find a good set of loop points as close to the beginning part of the steady-state as possible, while at the same time keeping the loop region as short as we can, to save memory. Other tricks to improve the quality of wavetable synthesis include using amplitude envelopes to control the ADSR characteristics as mentioned before as well as opening/closing filters to brighten or dampen part of the synthesized waveform.
September 25, 2009 13:32 spi-b673 9in x 6in b673-ch02
46 Introduction to Digital Signal Processing
4 Windowing, RMS, and Amplitude Envelope
In this section, we will present a method for computing the amplitude envelopes much like the ones shown in Figs. 2.2 and 2.3. We will start with an important concept called windowing. When analyzing a signal, one will most likely not analyze it sample-by-sample, especially when it is a large signal, nor would one analyze it as a whole single unit from beginning to