Let’s go back to what happens at the ADC, when the signal from the mic is first being turned into numbers. We’re measuring the signal 44,100 times per second — but how accurate are those individual measurements?
When you’re measuring how tall your children are, you probably use a yardstick. The yardstick is most likely marked off in 16ths of an inch. (In the backwoods USA, that is. In most of the modern world, it’s a meter stick, not a yardstick, and it’s marked off in millimeters, but we’ll go with the yardstick.) If your yardstick were marked off only in feet, with no marks in between, you’d have to record your children as all being two feet tall, three feet tall, four feet tall, or five feet tall. A child whose actual height was between three feet and four feet would have to be recorded as being either three feet or four feet tall, because your measuring system would provide no information more precise than that.
Being human, you’re a lot smarter than a computer, so if you were using such a stupid yardstick you’d probably record Suzy’s height as “a little more than three feet” or “not quite four feet.” But a computer can’t do that. For a computer, those in-between measurements don’t exist. The computer can only record whole, exact numbers. So it needs to use a yardstick that’s as precise as possible — a yardstick marked off into a lot of tiny increments.
The yardstick for measuring sound is described in terms of the number of bits that can be used to store each sample word. The more bits, the more precise the measurement.
BINARY NUMBERS: In Chapter One, hexadecimal notation was introduced. Hexadecimal is a convenient way to write the values of bytes because it’s not too difficult for humans to read. Inside the digital device, however, each byte consists not of a two-digit hexadecimal number but of eight bits (binary digits). Each bit is either a 1 or a 0. When it’s necessary to write out strings of bits, a space is usually put between bits four and five, like this: 0110 1011.
It turns out that eight bits are just about the minimum you need to represent sound acceptably. With an 8-bit ADC, the sound “yardstick” is marked off with 256 small increments. This is because an 8-8-bit value is always between 0 and 255. In binary arithmetic, we’d say that a value of zero is 0000 0000, while a value of 255 is 1111 1111.
First-generation sampling instruments such as the Fairlight CMI, E-mu Emulator, and Ensoniq Mirage recorded and played back sound as streams of 8-bit numbers. Eight-bit sound is noticeably harsh and grainy, because the measurements of the sound pressure level are often slightly inaccurate. When inaccuracy creeps into the system, we perceive it as added noise. The noise can’t be filtered out: Once it’s recorded into the sample, it’s there forever.
Sound is stored on standard music CDs as 16-bit numbers. Sixteen-bit audio has a much cleaner sound (less inherent noise), because the sound waves can be represented much more precisely. The 16-bit
“yardstick” is marked off into 65,536 tiny increments. This is enough precision for many musical purposes, and indeed 16-bit recording at 44.1kHz is the industry standard. It’s often referred to as “CD-quality.” Beware, though: Advertisers often apply this term to lower-quality audio in a deliberate, cynical attempt to mislead consumers.
Each time a bit is added to the digital audio data, the number of marks on the yardstick doubles. This cuts the amount of residual noise in the signal in half. In other words, the signal-to-noise ratio (often abbreviated “s/n ratio”) improves by 6dB. As a rule of thumb, the s/n ratio of 8-bit recording can be no better than 48dB (not quite as good as a turntable), while a 16-bit recording can have an s/n ratio of
96dB. In the real world, an ADC may not perform quite this well, so these figures are approximate.
But why stop there? If 16-bit sound is good, why not use 24-bit sound, or 32-bit sound, or 64-bit?
Modern digital audio software, running on a fast computer, often uses 24-bit or 32-bit numbers to represent sound waves. But the computer has to work harder to process larger numbers. When the computer is forced to work too hard, one of two things happens: Either the program refuses to add any more audio channels — for instance, a software synth might be unable to play new notes when you try to add them to an existing sonority — or the audio output abruptly fills up with ugly pops, clicks, and stuttering noises. The audio output might even shut down entirely.
JARGON BUSTER: The signal-to-noise ratio (s/n) of an electrical system, which is expressed in dB, is a measurement of the difference between the signal (the stuff we want to listen to) and the background noise that exists in the system. There are many ways of measuring the s/n ratio. Also, there may be more noise in a digital system when a signal is present than when there’s no signal. You can expect a decent piece of music gear to have a s/n ratio above 80dB — unless it’s a turntable.
The inherent noise of a stylus on vinyl reduces the s/n ratio to between 50 and 60dB at best.
When the audio engine in your computer stutters or chokes because it can’t spit out enough numbers quickly enough, we say you’re hearing dropouts. Asking a softsynth to play too many notes or a computer-based recorder to use too many effects plug-ins at once is just one possible source of audio dropouts;
there are others. On a PC, for instance, your soundcard may be sharing an IRQ (interrupt request) with other devices. To prevent dropouts, you may need to move the soundcard physically to a different slot in the computer. (This operation requires some care, however. If you’re encountering dropouts, don’t just start fooling around in the guts of the machine. If you’re not sure what you’re doing, phone your soundcard manufacturer’s technical support hotline and ask for their help.) Hardware digital synths are usually engineered well enough that you won’t hear dropouts in the audio; this is mainly an issue for computer users.
Each time the manufacturer of a new synth decides to improve the instrument’s audio quality by using a higher sampling rate or bit resolution, the audio software (or the OS in the hardware synth) can accomplish less before it uses up all of the available bandwidth in the CPU. Sooner or later, we reach a point of diminishing returns: Improving the audio quality further by increasing the sample resolution and bit rate isn’t useful, because the difference to human ears will be very, very subtle, while the degradation in performance caused by the amount of arithmetic the software has to execute in real time becomes overwhelming.
If the sampling rate is too low, the high frequencies in the sound will get lost.
Figure 2-2. When you do something, such as push a volume fader up to 11, that would require a digital audio signal to go past the maximum dynamic range of a component or module anywhere in the system, the signal clips. When viewed in an audio editor, a clipped signal has a flat top and/or bottom rather than a rounded shape. Clipping adds buzzy, high-pitched partials to the sound.
If the bit resolution (also called word length, because each sample is stored as an 8-bit, 16-bit, or 24-bit numerical “word”) is too low, the sound will be noisy. That’s pretty much all you need to know.
Most likely, your digital synthesizer will support at least a 16-bit, 44.1kHz data stream, so if you’re hearing a poor-quality signal, the source of your problems will probably lie elsewhere. Other forms of digital audio nastiness include:
• Clipping. There’s an absolute limit on how large the numbers in a digital audio system can be.
(With floating-point math, this isn’t precisely true, but clipping can still become a problem at various points in the signal path.) If your audio software tries to make or use a number that’s too big, the waveform will reach the maximum possible level and then “clip.” In an audio editing program, clipping looks like Figure 2-2. If it’s brief, clipping sounds like a pop or click. If it goes on for more than a few milliseconds, it sounds as if the audio is being mangled with a buzz saw.
• Aliasing. If a digital synth tries to make a sound that contains any overtones higher than the Nyquist frequency, new partials will be introduced. The new partials will not be harmonically related to the fundamental. This phenomenon is called aliasing or foldover. A detailed discussion of aliasing would take several pages and several diagrams. Suffice it to say that if a high-pitched tone sounds bell-like when you don’t expect it to, or if a tone with vibrato has an unexpected up-and-down whooshing quality, you’ve got aliasing. It may help if you choose a waveform that has fewer overtones (such as a triangle wave instead of a sawtooth).
In a computer-based synth, you may also want to check whether the output is set to a 44.1kHz sampling rate. If the synth lets you choose this value, and if it’s currently set to 22.05kHz or 32kHz, increasing the value to 44.1kHz will quite likely reduce or eliminate aliasing.
Figure 2-3. The phase of a sine wave is measured in degrees. Each cycle of the wave goes through 360 degrees. 360 is the same as zero.