Digital Audio - Pro Engineer School Vol.1

Why digital? Why wasn't analog good enough? The answer starts with the analog tape recorder which plainly isn't good enough in respect of signal to noise ratio and distortion performance. Many recording engineers and producers like the sound of analog now, because it is a choice. In the days before digital, analog recording wasn't a choice - it was a necessity. You couldn't get away from the problems. Actually you could. With Dolby A and subsequently SR noise reduction, noise performance was vastly improved, to the point where it wasn't a problem at all. And if you don't have a problem with noise, you can lower the recording level to improve the distortion performance of analog tape. A recording well made with Dolby SR noise reduction can sound very good indeed. Some would say better than 16-bit digital audio, although this is from a subjective, not a scientific, point of view. Analog record also had the problem that when a tape was copied, the quality would deteriorate significantly. And often there were several generations of copies between original master and final product. Digital audio can be copied identically as many times as necessary (although this doesn't always work as well as you might expect. More on this in another module).

In the domestic domain, before CD there was only the vinyl record. Well there was the compact cassette too, but that never even sounded good even with Dolby B noise reduction. (Some people say that they don't like Dolby B noise reduction. The problem is that they are usually comparing an encoded recording with decoding switched on and off. The extra brightness of the Dolby B encoded - but not decoded - sound compensates for dirty and worn heads and the decoded version sounds dull in comparison!). People with long memories will know that they used to yearn for a format that wasn't plagued with the clicks, pops and crackles of vinyl. The release of the CD format was eagerly anticipated, and of course the CD has become a great success.

Done properly, digital audio recorders can greatly outperform analog in both signal to noise ratio and distortion performance. That is why they are used in both the professional and domestic domains. When the question arises of why the other parts of the signal chain have mostly been changed over to digital, any possible improvement in sound quality is hardly relevant. Everything else performs as well as anyone could possibly want. Well almost anyone, the only exceptions being the

microphone and the loudspeaker, but we are still some way off truly digital transducers becoming available. By the time digital recording and reproduction had become properly established, digital audio in general was showing that it could offer advantages over analog in terms of price and facilities offered. Digital effects were first, as it became possible to achieve, for instance, digital reverberation for a tiny fraction of the cost of an electromechanical system. Digital mixing consoles came rather later because they require an incredible processing power. Digital mixing consoles don't sound better than analog. They do however offer more facilities for the price, and have the advantage that settings can easily be stored and recalled. This is an important feature that we shall discuss more when we discuss mixing consoles.

Having established the reasons we have digital audio, let's see how it works...

Digital Theory

Firstly, what do we mean by analog? Analog comes from the word analogy. If I say that electrical voltage is a similar concept to the pressure of water behind a tap (excuse me, faucet), then I am making an analogy.

If I convert an acoustic sound to an electrical signal where the rise and fall in sound pressure is imitated by a similar rise and fall in voltage, then the electrical signal is an analog of the original. An analog signal is continuous. It follows the changes of the original without any kind of subdivision. It might not be able to track the changes fast enough for complete accuracy, in which case the high frequency response will be worse than it could be. Its useful dynamic range lies between a maximum value which the analog signal cannot exceed (generally the positive and negative voltage limits of the power supply - the signal can never exceed these and will be clipped if it tries) and random variations at a very low level that we hear as noise.

Digital systems analyze the original in two ways: firstly by 'sampling' the signal a number of times every second. Any changes that happen completely between sampling periods are ignored, but if the sampling periods are close enough together, the ear won't notice. The other is by 'quantizing' the signal into a number of discrete separately identifiable -levels. The smoothly changing analog signal is therefore turned into a stair-step approximation, since digital audio knows no 'in-between' states.

As you can see, the digital signal here is only a crude approximation of the original, but it can be made better by increasing the sampling frequency (sampling rate), and by increasing the number of quantization levels. Let's go deep...

To reproduce any given frequency, the sampling frequency, or sampling rate, has to be at least twice that frequency. So to convert the full range of human hearing to digital, a sampling frequency of at least 40 kHz ( twice 20 kHz) is necessary. In practice, a 'safety margin' has to be added, so we get the standard compact disc sampling frequency of 44.1 kHz (exactly this to coincide with the requirements of early digital equipment), and 48 kHz which is used in broadcasting (since in the early days of digital it was easier to convert to the standard satellite sampling frequency of 32 kHz).

To reduce the quantization error between the digital signal and the original analog, more quantization levels must be used. Compact disc and DAT both use 65,536 levels. This, in digital terms, is a nice round number corresponding to 16 bits. Without going into binary arithmetic, each bit provides roughly 6 dB of signal to noise ratio. Therefore a digital

audio system with 16-bit resolution has a signal to noise ratio (at least in theory) of 96 dB.

The question will arise, what happens if a digital system is presented with a frequency higher than half the sampling frequency? The answer is that a phenomenon known as aliasing will occur. What happens is that these higher frequencies are not properly encoded and are translated into spurious frequencies in the audio band. These are only distantly related to the input frequencies and absolutely unmusical (unlike harmonic distortion, which can be quite pleasant in moderation). The solution is not to allow frequencies higher than half the sampling rate (in fact less, to give a margin of safety) into the system. Therefore an 'anti-aliasing' filter is used just after the input. Filter design is complex, particularly filters with the steep slopes necessary to maximize frequency response, but not be too wasteful on storage or bandwidth by having a sampling rate that is unnecessarily high. The design of the filters is one of the distinguishing points that make different digital systems actually sound different.

Once the signal has been filtered, sampled and quantized, it must be coded. It might be possible to record the binary digits directly but that wouldn't offer the best advantage, and indeed might not work. In the compact disc system, the tiny pits in the aluminized audio layer themselves form the spiral that the laser follows from the start of the recording to the end. A binary '1' is coded by a transition from 'land' - the level surface - to a pit or vice-versa. A binary '0' is coded by no transition. But what if the signal was stuck on '0' for a period of time - the spiral would disappear! Hence a system of coding is used that rearranges the binary digits in such a way that they are forced to change every so often, simply to make a workable system. There are other such constraints that we need not go into here.

Additionally there is the need for error correction. In any storage medium there are physical defects that would damage the data if nothing were done to prevent such damage. So additional data is added to the raw digital signal, firstly to check on replay whether the data is valid or erroneous, secondly to add a backup data stream so that if a section of data is corrupted, it can be reconstituted from other data nearby. Adding error correction involves a compromise between preserving the integrity of the digital signal, and not adding any more extra data than necessary.

It is fair to say that the error correction system on CD, and on DAT, is

very good. But as in all things, more modern digital systems are cleverer, and better.

All of the above is known as analog to digital encoding, or A to D. The reverse process is known, fairly obviously, as decoding. To spare the details that only electronics experts need to know, the digital signal goes through a D to A convertor and out comes an analog signal. The only problem is that it now contains a strong component at the sampling frequency. Obviously this is above audibility, but it could cause severely audible distortion if allowed into any other equipment that couldn't properly handle it. To obviate this therefore, the output is filtered with what is known as a 'brickwall' filter, because of its steep slope. Once again the design of the filter does affect the sound quality, but digital tricks have now been developed to make the filter's job easier, therefore design is more straightforward.

Analog to Digital Conversion

Filtering: removing frequencies, in the analog domain, that are higher than half the sampling rate.

Sampling: measuring the signal level once per sampling period.

Quantization: deciding which of the 65,536 levels (in a 16-bit system) is closest to the input signal level, for each sampling period.

Coding: converting the result to a binary number according to a scheme that incorporates a) error detection, b) provision for error correction, c) is recordable or transmissable in the chosen medium.

The A to D decoder incorporates three levels of protection against damaged data:

Error correction; an error is detected in the data and completely corrected by using the additional error-correction data specifically put there for the purpose.

Error concealment; an error is detected but it is too severe to be corrected. Missing data is therefore 'interpolated' - just one of the many scientific words for 'guess' - from surrounding data and the result hopefully will be inaudible. However, if you ever get chance to see a CD

player that has correction and concealment indicator lights, you will notice that an awful lot of concealment goes on just to play an average disc. How well concealment is done is one of the factors that make different digital systems sound different.

Muting; in this case the error is so bad that the system shuts down momentarily rather than output what could be an exceedingly loud glitch.

Bandwidth

Bandwidth, in this context, is the rate of flow of data measured in kilobits per second. 1 kilobit is 1024 bits. Often, the term byte is used where 1 byte = 8 bits. The abbreviation for bit is 'b' and for byte is 'B', but these are often confused, as are the multiplier prefixes 'k' meaning x1000, and 'K' meaning x1024.

The bandwidth of a single channel of 16-bit 44.1 or 48 kHz digital audio is roughly 750 Kbps. Compare this with the bandwidth of a modem (56 Kbps), ISDN2 (128 Kbps) and common ADSL Internet connections (512 Kbps). None of these systems is capable of transmitting even a single channel of digital audio, hence the need for MP3 and similar data-reduction systems.

24/96

The quest for ever better sound quality leads us to want to increase both the sampling rate and the resolution. 24-bit resolution will in theory give a signal to noise ratio of 144 dB. This will never happen in practice, but the real achievable signal to noise ratio is probably as good as anyone could reasonably ask for. Of course, some of the available dynamic range may be used as additional headroom, to play safe while recording, but even so the resulting recording will be remarkably quiet. Also, even though most of us cannot even hear up to 20 kHz, a frequency which is perfectly well catered for these days by a 44.1 or 48 kHz sampling rate, there is always a nagging doubt that this is only just good enough, and it would be worthwhile to have a really high sampling rate to put all doubt at an end.

This of course, affects storage requirements. It is a reasonable rule of thumb that CD-quality stereo audio requires about 10 Megabytes per minute of storage. 24-bit, 96 kHz digital audio will therefore, by simple

multiplication, require 30 Megabytes per stereo minute. Of course, Megabytes are getting cheaper all the time. There is another problem however - data bandwidth. When recording onto a hard disk system, there is a certain data throughput rate beyond which the system will struggle and possibly fail to record or playback properly. A standard modern hard drive should be easily capable of achieving 24 tracks of playback under normal circumstances (the track count is affected, for one thing, by the 'edit density' - the more short segments you cut the audio into, and the more widely the data is physically separated on the disk, the harder it will be to play back). Try this at three times the data rate and the track count, or the reliability is bound to suffer. However, disks are getting ever faster and most of the problems of this nature are in the past.

Before long it will be possible to get virtually any number of tracks quite easily. It's worth a quick look at Digidesign's comments on hard disk specifications to maximize track count.

Digital Interconnection

Digital interconnection comes in a number of standards, which are summarized here:

AES/EBU

• Also known as AES3 1985 (the year it was implemented)

• Standard for professional digital audio

• Supports up to 24-bit at any sampling rate

• Transmits 2 channels on a single cable

• Uses 110 ohm balanced twisted wire pair cables usually terminated with XLR connectors

• Can use cables of length up to 100 meters

• Electrical signal level 5 volts

• Standard audio cables can be used for short distances but are not recommended as their impedance may not be the standard 110 ohm and reflections may occur at the ends of the cable

• Data transmission at 48 kHz sampling rate is 3.072 Megabit/s (64x the sampling rate)

• Self clocking but master clocking is possible S/PDIF

• Two types:

• Electrical

• Uses 75 Ohm unbalanced coaxial cable with RCA phono connectors

• Cable lengths limited to 6 meters.

• Optical

• TOSLINK - Uses plastic fiber optic cable and same connectors as Lightpipe (below). TOSLINK is an optical data transmission technology developed by Toshiba. TOSLINK does not specify the protocol to be used

• ST-type - Glass fiber can be used for longer lengths (1 kilometer).

• Meant for consumer products but may be seen on professional equipment

• Supports up to 24-bit/48 kHz sampling rate

• Self-clocking

• It ought to be necessary to use a format converter when connecting with AES/EBU since the electrical level is different (0.5 V) and the format of the data is different also. However, some AES/EBU inputs can recognise an S/PDIF signal

• Some of the bits within the Channel Status blocks are used for SCMS (Serial Copy Management System), to prevent consumer machines from making digital copies of digital copies.

MADI

• an extension of the AES3 format (AES/EBU)

• supports up to 24-bit/48 kHz sampling rate (higher rates are possible)

• transmits 56 channels on a 75 Ohm video coaxial cable with BNC connectors

• Length limited to 50 meters. Fiber-optic cable can be used for longer lengths

• Data transmission rate is 100 Megabit/s

• Requires a master clock - a dedicated master synchronization signal must be applied to all transmitters and receivers.

ADAT Optical

• Sometimes known as 'Lightpipe'

• Implemented on the Alesis ADAT MDM and digital devices such as mixing consoles, synthesizers and effects units

• Supports of to 24-bit/48 kHz sampling rate

• Transmits 8 channels serially on fiber-optic cable

• Distance limited to 10 meters., or up to 30 meters with glass fiber cable

• Data transmission at 48 kHz is 12 Megabit/s

• Self clocking

• Channels can be reassigned (digital patchbay function)

TDIF (Tascam Digital Interface Format)

• Implemented on Tascam's family of DA-88 recorders and other digital devices such as mixing consoles

• Supports of to 24-bit/multiple sampling rates

• Transmits 8 channels on multicore, unbalanced cables with 25-pin D-sub connectors

• Bidirectional interface: a single cable carries data in both directions

• Cable length limited to 5 meters

• Data transmission at 48 kHz sampling rate is 3 Megabit/s (like AES/EBU)

• Intended for a master clock system, although self-clocking is possible

Check Questions

• To which type of sound engineering equipment was digital audio first applied?

• In relation to the question above, why was this the most pressing need?

• What types of equipment are currently not available in digital form?

• Describe 'sampling rate'.

• What is the minimum sampling rate for a digital system capable of reproduction up to 20 kHz (ignoring any 'safety margin').

• What is 'aliasing'?

• What two sampling rates are most commonly used in digital audio?

• Describe quantization.

• What is the signal to noise ratio, in theory, of a digital system with 20-bit resolution?

• Why is coding necessary? Give two reasons.

• Why does a digital to analog convertor need a filter?

• What is error correction?

• What is error concealment?

• What happens (or at least should happen) if an error is neither corrected nor concealed?

• How many Megabytes of data, approximately, are occupied by one minute of CD-quality stereo digital audio?

Why, in a hard disk recording system, is it likely that fewer tracks can be replayed simultaneously at the 24-bit/96 kHz standard, than at the

CD-q u a l i t y 1 6 - b i t / 4 4 . 1 k H z s t a n d a r d ?

In document Pro Engineer School Vol.1 (Page 64-75)