An often-overlooked but essential area of game programming is sound.
Computer sound processing is as much of a science as computer graphics, but the basics of format conversion, mixing, and playback are fairly straightforward. This section discusses the basics of computer audio and investigates SDL’s audio-programming interface.
Representing Sound with PCM
Computer sound is based on pulse-code modulation, orPCM. As you know, pixels in a video surface encode the average color intensities of an optical image at regular intervals, and more pixels allow for a closer representation of the original image. PCM data serves the same purpose, except that it represents the average intensities of sequential intervals in sound waves. Each “pixel” of PCM data is called a sample. The rate at which these samples occur is the sampling rate orfrequency of the sound data. Sampling rates are expressed in the standard SI frequency unit, hertz (Hz). A higher sampling rate allows for a closer representation of the original sound wave.
Individual PCM samples are usually 8 or 16 bits (1 or 2 bytes) for each channel (one channel for mono, two channels for stereo), and game-quality sound is most often sampled at either 22,050 or 44,100 Hz. Samples can be represented as signed or unsigned numbers. A 16-bit sample can obviously express the intensity of a sound with much greater precision than an 8-bit sample, but it involves twice as much data. At 44,100 Hz with 16-bit samples, one second of sound data will consume nearly 90 kilobytes of storage, or twice that for stereo. Game
Mono Stereo 8 bit 16 bit 8 bit 16 bit 11025 Hz 11,025 22,050 22,050 44,100 22050 Hz 22,050 44,100 44,100 88,200 44100 Hz 44,100 88,200 88,200 176,400
Table 4–1: Storage consumed by various sound formats (in bytes per second)
programmers must decide on a trade-off between sound quality and the amount of disk space a game will consume. Fortunately, this trade-off has become less of a problem in recent years, with the advent of inexpensive high-speed Internet connections and the nearly universal availability of CD-ROM drives.
Just as raw pixel data is often stored on disk in.bmpfiles, raw PCM sound samples are often stored on disk in.wav files. SDL can read these files with the
SDL LoadWAV function. There are several other PCM sound formats (such as.au
and .snd), but we will confine our discussion to .wav files for now. (There is currently no audio file equivalent to the SDL image library—are you interested in writing one for us?)
Function SDL LoadWAV(file, spec, buffer, length)
Synopsis Loads a RIFF.wav audio file into memory.
Returns Non-NULL on success,NULL on failure. Fills the given
SDL AudioSpec structure with the relevant
information, sets*buffer to a newly allocated buffer of samples, and sets*length to the size of the sample data, in bytes.
Parameters file—Name of the file to load. The more general
SDL LoadWAV RW function provides a way to load.wav
data from nonfile sources (in fact, SDL LoadWAVis just a wrapper aroundSDL LoadWAV RW).
spec—Pointer to theSDL AudioSpec structure that should receive the loaded sound’s sample rate and format.
buffer—Pointer to theUint8 *that should receive the newly allocated buffer of samples.
length—Pointer to theUint32that should receive the length of the buffer (in bytes).
Function SDL FreeWAV(buffer)
Synopsis Frees memory allocated by a previous call to
SDL LoadWAV. This is necessary because the data might not have been allocated withmalloc, or might be subject to other considerations. Use this function
only for freeing sample data allocated by SDL; free your own sound buffers withfree.
Parameters buffer—Sample data to free.
Structure SDL AudioSpec
Synopsis Contains information about a particular sound format: rate, sample size, and so on. Used by SDL OpenAudio
and SDL LoadWAV, among other functions.
Members freq—Frequency of the sound in samples per second. For stereo sound, this means one sample per channel per second (i.e., 44,100 Hz in stereo is actually 88,200 samples per second).
format—Sample format. Possible values are
AUDIO S16and AUDIO U8. (There are other formats, but they are uncommon and not fully supported—I found this out the hard way.)
silence—PCM sample value that corresponds to silence. This is usually either 0 (for 16-bit signed formats) or 128 (for 8-bit unsigned formats).
Calculated by SDL. Read-only.
channels—Number of interleaved channels. This will normally be either one (for mono) or two (for stereo).
samples—Number of samples in an audio transfer buffer. A typical value is 4,096.
size—Size of the audio transfer buffer in bytes.
Calculated by SDL. Read-only.
callback—Pointer to the function SDL should call to retrieve more sample data for playback.
PCM data is convenient to work with, despite the necessary size considerations. The key is to realize that PCM is simply a set of measurements that
approximate a wave of energy. A strong sound wave will result in large PCM sample values, and a weak sound wave will result in small values. To increase or decrease the volume of a PCM sound wave, simple multiply each sample by a constant. To create a volume-fading effect, multiply each sample by a
progressively larger or smaller value. To fade between two samples, simply perform an average with changing weights. Waves are additive; a program can combine (mix) sounds simply by adding (or averaging) the samples together. Remember that binary numbers have limits; multiplying a sample by a large constant or adding too many samples together is likely to cause an overflow, which will result in distorted sound.
Feeding a Sound Card
A sound card is conceptually simple: it accepts a continuous stream of PCM samples and recreates the original sound wave through a set of speakers or headphones. Your basic task, then, is to keep the sound card supplied with PCM samples. This is a bit of a trick. If you want 44.1 kilohertz (kHz) sound (the quality of sound stored on audio CDs), you must supply the sound card with 44,100 samples per second, per channel. With 16-bit samples and two channels (stereo), this comes out to 176,400 bytes of sound data (see Table 4)! In addition, timing is critical. Any lapse of data will result in a noticeable sound glitch. It would be both difficult and woefully inefficient to make a game stop 44,100 times each second to feed more data to the sound card. Fortunately, the computer gives us a bit of help. Most modern computer architectures include a feature calleddirect memory access, orDMA. DMA provides support for
high-speed background memory transfers. These transfers are used for a variety of purposes, but their most common use is to shovel large amounts of data to
sound cards, hard drives, and video accelerators. You can periodically give the computer’s DMA controller buffers of several thousand PCM samples to transfer to the sound card, and the DMA controller can alert the program when the transfer is complete so that it can send the next block of samples.
The operating system’s drivers take care of DMA for you; you simply have to make sure that you can produce audio data quickly enough. This is sometimes done with acallback function. Whenever the computer’s sound hardware asks SDL for more sound data, SDL in turn calls your program’s audio callback function. The callback function mustquickly copy more sound data into the given buffer. This usually involves mixing several sounds together.
This scheme has one small problem: since you send data to the sound card in chunks, there will always be a slight delay before any new sound can be played. For instance, suppose that our program needed to play a gunshot sound. It would probably add the sound to an internal list of sounds to mix into the output stream. However, the sound card might not be ready for more data, so the mixed samples would have to wait. This effect is calledlatency, and you should minimize it whenever possible. You can reduce latency by specifying a smaller sound buffer when you initialize the sound card, but you cannot
realistically eliminate it (this is usually not a problem in terms of realism; there is latency in real life, because light travels much faster than sound).
An Example of SDL Audio Playback
We have discussed the nuts and bolts of sound programming for long enough; it is time for an example. This example is a bit lengthier than our previous examples, but the code is fairly straightforward.
Code Listing 4–12 (audio-sdl.c)
/* Example of audio mixing with SDL. */ #include <SDL/SDL.h>
#include <stdio.h> #include <stdlib.h> #include <assert.h>
/* Structure for loaded sounds. */ typedef struct sound_s {
Uint8 *samples; /* raw PCM sample data */
Uint32 length; /* size of sound data in bytes */ } sound_t, *sound_p;
/* Structure for a currently playing sound. */ typedef struct playing_s {
int active; /* 1 if this sound should be played */ sound_p sound; /* sound data to play */
Uint32 position; /* current position in the sound buffer */ } playing_t, *playing_p;
/* Array for all active sound effects. */ #define MAX_PLAYING_SOUNDS 10
playing_t playing[MAX_PLAYING_SOUNDS];
/* The higher this is, the louder each currently playing sound will be. However, high values may cause distortion if too many sounds are playing. Experiment with this. */
#define VOLUME_PER_SOUND SDL_MIX_MAXVOLUME / 2 /* This function is called by SDL whenever the sound card
needs more samples to play. It might be called from a separate thread, so we should be careful what we touch. */ void AudioCallback(void *user_data, Uint8 *audio, int length) {
int i;
/* Clear the audio buffer so we can mix samples into it. */ memset(audio, 0, length);
/* Mix in each sound. */
for (i = 0; i < MAX_PLAYING_SOUNDS; i++) { if (playing[i].active) {
Uint8 *sound_buf; Uint32 sound_len;
/* Locate this sound’s current buffer position. */ sound_buf = playing[i].sound->samples;
/* Determine the number of samples to mix. */ if ((playing[i].position + length) > playing[i].sound->length) { sound_len = playing[i].sound->length - playing[i].position; } else { sound_len = length; }
/* Mix this sound into the stream. */ SDL_MixAudio(audio, sound_buf, sound_len,
VOLUME_PER_SOUND);
/* Update the sound buffer’s position. */ playing[i].position += length;
/* Have we reached the end of the sound? */
if (playing[i].position >= playing[i].sound->length) { playing[i].active = 0; /* mark it inactive */ }
} } }
/* This function loads a sound with SDL_LoadWAV and converts it to the specified sample format. Returns 0 on success and 1 on failure. */
int LoadAndConvertSound(char *filename, SDL_AudioSpec *spec, sound_p sound)
{
SDL_AudioCVT cvt; /* format conversion structure */ SDL_AudioSpec loaded; /* format of the loaded data */ Uint8 *new_buf;
/* Load the WAV file in its original sample format. */ if (SDL_LoadWAV(filename,
&loaded, &sound->samples, &sound->length) == NULL) {
printf("Unable to load sound: %s\n", SDL_GetError()); return 1;
/* Build a conversion structure for converting the samples. This structure contains the data SDL needs to quickly convert between sample formats. */
if (SDL_BuildAudioCVT(&cvt, loaded.format,
loaded.channels, loaded.freq, spec->format, spec->channels, spec->freq) < 0) {
printf("Unable to convert sound: %s\n", SDL_GetError()); return 1;
}
/* Since converting PCM samples can result in more data (for instance, converting 8-bit mono to 16-bit stereo), we need to allocate a new buffer for the converted data. Fortunately SDL_BuildAudioCVT supplied the necessary information. */
cvt.len = sound->length;
new_buf = (Uint8 *) malloc(cvt.len * cvt.len_mult); if (new_buf == NULL) {
printf("Memory allocation failed.\n"); SDL_FreeWAV(sound->samples);
return 1; }
/* Copy the sound samples into the new buffer. */ memcpy(new_buf, sound->samples, sound->length); /* Perform the conversion on the new buffer. */ cvt.buf = new_buf;
if (SDL_ConvertAudio(&cvt) < 0) {
printf("Audio conversion error: %s\n", SDL_GetError()); free(new_buf);
SDL_FreeWAV(sound->samples); return 1;
}
/* Swap the converted data for the original. */ SDL_FreeWAV(sound->samples);
sound->samples = new_buf;
sound->length = sound->length * cvt.len_mult; /* Success! */
printf("’%s’ was loaded and converted successfully.\n", filename);
return 0; }
/* Removes all currently playing sounds. */ void ClearPlayingSounds(void)
{
int i;
for (i = 0; i < MAX_PLAYING_SOUNDS; i++) { playing[i].active = 0;
} }
/* Adds a sound to the list of currently playing sounds. AudioCallback will start mixing this sound into the stream the next time it is called (probably in a fraction
of a second). */
int PlaySound(sound_p sound) {
int i;
/* Find an empty slot for this sound. */ for (i = 0; i < MAX_PLAYING_SOUNDS; i++) {
if (playing[i].active == 0) break;
}
/* Report failure if there were no free slots. */ if (i == MAX_PLAYING_SOUNDS)
return 1;
/* The ’playing’ structures are accessed by the audio callback, so we should obtain a lock before
we access them. */ SDL_LockAudio(); playing[i].active = 1; playing[i].sound = sound; playing[i].position = 0;
SDL_UnlockAudio(); return 0; } int main() { SDL_Surface *screen; SDL_Event event;
int quit_flag = 0; /* we’ll set this when we want to exit. */ /* Audio format specifications. */
SDL_AudioSpec desired, obtained;
/* Our loaded sounds and their formats. */ sound_t cannon, explosion;
/* Initialize SDL’s video and audio subsystems. Video is necessary to receive events. */
if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO) != 0) {
printf("Unable to initialize SDL: %s\n", SDL_GetError()); return 1;
}
/* Make sure SDL_Quit gets called when the program exits. */ atexit(SDL_Quit);
/* We also need to call this before we exit. SDL_Quit does not properly close the audio device for us. */
atexit(SDL_CloseAudio);
/* Attempt to set a 256x256 hicolor (16-bit) video mode. */ screen = SDL_SetVideoMode(256, 256, 16, 0);
if (screen == NULL) {
printf("Unable to set video mode: %s\n", SDL_GetError()); return 1;
}
/* Open the audio device. The sound driver will try to give us the requested format, but it might not succeed.
The ’obtained’ structure will be filled in with the actual format data. */
desired.freq = 44100; /* desired output sample rate */ desired.format = AUDIO_S16; /* request signed 16-bit samples */ desired.samples = 4096; /* this is somewhat arbitrary */ desired.channels = 2; /* ask for stereo */
desired.callback = AudioCallback;
desired.userdata = NULL; /* we don’t need this */ if (SDL_OpenAudio(&desired, &obtained) < 0) {
printf("Unable to open audio device: %s\n", SDL_GetError()); return 1;
}
/* Load our sound files and convert them to the sound card’s format. */
if (LoadAndConvertSound("cannon.wav", &obtained, &cannon) != 0) {
printf("Unable to load sound.\n"); return 1;
}
if (LoadAndConvertSound("explosion.wav",
&obtained, &explosion) != 0) { printf("Unable to load sound.\n");
return 1; }
/* Clear the list of playing sounds. */ ClearPlayingSounds();
/* SDL’s audio is initially paused. Start it. */ SDL_PauseAudio(0);
printf("Press ’Q’ to quit. C and E play sounds.\n");
/* Start the event loop. Keep reading events until there is an event error or the quit flag is set. */
while (SDL_WaitEvent(&event) != 0 && quit_flag == 0) { SDL_keysym keysym;
switch (event.type) { case SDL_KEYDOWN:
/* If the user pressed Q, exit. */ if (keysym.sym == SDLK_q) {
printf("’Q’ pressed, exiting.\n"); quit_flag = 1;
}
/* ’C’ fires a cannon shot. */ if (keysym.sym == SDLK_c) {
printf("Firing cannon!\n"); PlaySound(&cannon);
}
/* ’E’ plays an explosion. */ if (keysym.sym == SDLK_e) { printf("Kaboom!\n"); PlaySound(&explosion); } break; case SDL_QUIT:
printf("Quit event. Bye.\n"); quit_flag = 1;
} }
/* Pause and lock the sound system so we can safely delete our sound data. */
SDL_PauseAudio(1); SDL_LockAudio();
/* Free our sounds before we exit, just to be safe. */ free(cannon.samples);
free(explosion.samples);
/* At this point the output is paused and we know for certain that the callback is not active, so we can safely unlock the audio system. */
SDL_UnlockAudio(); return 0;
}
We begin by initializing SDL as usual, adding theSDL INIT AUDIO bit flag to specify that SDL should prepare the audio subsystem for use. We also initialize the video subsystem for the purpose of reading keyboard events. Next we install two atexithooks: one for the usual SDL Quit function, and one to specifically close the audio device on shutdown. The latter hook is important, as failure to close the audio device properly can result in a segmentation fault when the program exits. At this point the sound card is not actually ready for output; we have only set up SDL’s basic infrastructure.
The next step is to initialize the sound card for an appropriate sample format. Our program builds anSDL AudioSpec structure with the desired sound parameters and callsSDL OpenAudio to prepare the sound card. Since it is possible that the requested sample format will not be available,SDL OpenAudio
stores the actual sample format in theSDL AudioSpecstructure passed as its second parameter. We can use this structure to convert our sound data to the correct format for playback. Our program requests signed 16-bit samples at 44 kHz. 8-bit sound is lacking in quality, and unsigned 16-bit samples are not supported by SDL.
Function SDL OpenAudio(desired, obtained)
Synopsis Initializes the computer’s sound hardware for playback at the specified rate and sample format. If the
requested format isn’t available, SDL will pick the closest match it can find. This function does not let you select a particular sound device; if you need to do that, check out theSDL InitAudio function (not documented here, since SDL Init normally takes care of that).
Returns 0 on success, −1 on failure. On success, fills obtained
with the rate and sample format of the sound device (which may not be exactly what you requested).
Parameters desired—Pointer to an SDL AudioSpecstructure containing the desired sound parameters.
obtained—Pointer to anSDL AudioSpec structure that will receive the sound parameters that SDL was able to obtain.
Function SDL CloseAudio()
Synopsis Closes the audio device opened bySDL OpenAudio. It’s a good idea to call this as soon as you’re finished with playback, so that other programs can use the audio hardware.
Function SDL PauseAudio(state)
Synopsis Pauses or unpauses audio playback. Playback is initially paused, so you’ll need to use this function at least once to start playback.
Parameters state—1 to pause playback, 0 to start playback
Now that the sound card is initialized and ready for data, our program loads two
.wav sound files and converts them to the correct format for playback. It uses the information provided by SDL OpenAudio to perform this conversion. The only trick to using SDL’s conversion routines is to make sure that there is enough memory to store the converted data. For example, suppose that the sound card expects 16-bit stereo sound at 44 kHz, but the.wav file contains 11-kHz, 8-bit mono samples. The conversion would result in eight times as much sample data. SDL performs sample conversions in place, so it is up to our program to ensure that it has allocated a sufficiently large buffer. The end result is that we cannot simply convert the buffer returned by theSDL LoadWAV
function; we must allocate our own buffer and copy the loaded samples into it.
Function SDL BuildAudioCVT(cvt, srcfmt, srcchan, srcfreq, destfmt, destchan, destfreq)
Synopsis Builds a structure that contains the information necessary for converting between sample formats (src to dest). Use SDL ConvertAudioto actually perform the conversion.
Returns 0 on success, −1 on failure.
Parameters cvt—Pointer to anSDL AudioCVT structure that will receive the conversion information.
srcfmt—Sample format of the source sample data. This corresponds to theformatmember of
SDL AudioSpec.
srcchan—Number of channels in the source sample data. 1 for mono, 2 for stereo.
srcfreq—Frequency in hertz of the source sample data.
destfmt—Sample format of the destination sample data.
destchan—Number of channels in the destination sample data.
destfreq—Frequency in hertz of the destination sample data.
Function SDL ConvertAudio(cvt)
Synopsis Converts the buffer of audio data incvt->buf (of lengthcvt->lenbytes) in-place between sample formats, as set up by a previous call to
SDL BuildAudioCVT. Make sure thatcvt->bufis big enough to accept the resulting sample data.
Parameters cvt—Audio conversion structure as described above.
Our program is now ready to mix and play sounds. It unpauses the SDL audio