The Waveform

3. Redefining the Audio Editor

3.3 Representation & Manipulation

3.3.1 The Waveform

In the majority of audio software – not just audio editing software – the (default) approach to representing audio visually is by displaying a waveform. The waveform display shows amplitude changes over time, representing the pressure changes registered by a recording device. A microphone, just as the human eardrum or any physical object, registers sound (air pressure changes) as a combined pressure amplitude of all incoming pressure waves at a single instance of time. A waveform display thus represents the physical nature of sound. For a single frame of time, a single value or sample represents the average registered amplitude over a small span of time, the length of which depends on the sampling rate. Multiple channels might be available within the same timeframe. For example, stereo audio is generally recorded using 2 microphones, resulting in two channels of audio, represented by two samples for a single frame of time.

Uncompressed audio is typically stored as a list of samples. CD quality audio is stored as 2 sets (i.e. 2 channels) of samples, counting 44100 samples for every second of audio (the sampling rate), for each channel. From a developer’s point of view, generating a waveform display is almost36_{as simple as plotting a line} that follows each sample or amplitude value. Other representations – that will be discussed below – require complex transformations of the sample values to obtain a different data set and hence a different display.

A waveform is generally plotted on a horizontal axis representing time, against a vertical axis representing amplitude. If we were to plot every frame of samples on a single column of pixels, then on an average computer display we could only see about 40 milliseconds of CD quality audio. Zooming out will require a single pixel width to represent multiple frames of audio, meaning some kind of averaging needs to be performed. Just adding all sample values [36] Apart from having to account for converting between integer and floating point values, endianness (byte order) and interlacing of channels.

within a single frame and dividing it by the number of samples might result in a value of 0, especially in the case of a steady oscillation such as a pure sine wave, moving equally as much above as below zero amplitude. The decision on what then to display differs among software.

Different kinds of waveforms

A common approach is to find the loudest sample, or peak value, amongst each group of samples needing to be represented by a single value. For very short timespans (zoomed in) a line can be drawn between each consecutive peak value, which can be positive or negative. For wider timespans (zoomed out) both a positive and negative peak value are determined for each group of samples, and the area between these two values is coloured in. This approach can be found for instance in Rogue Amoeba’s Fission (figure 12a).

An elaboration on this approach is to draw the root mean square or RMS value on top of the peak value graph. The RMS is the square root of the average of the squares of each sample within a timeframe, and as it uses averaging it will never exceed the peak value, thus fitting nicely within the peak graph. Combined, it provides feedback on the average loudness and maximum amplitude of a timeframe. Audacity and Sweep are examples of audio editing software implementing this approach (figure 12b).

In most cases, amplitude is plotted on the vertical axis on a linear scale. This might however be misleading, as it doesn’t correspond to the way the human ear works. If we halve the amplitude of a sound, we won’t perceive it as half as loud. Utilising a logarithmic scale gives an image more approximate to our loudness perception. The logarithmic scale is used in decibel meters, as decibel is a logarithmic unit. Making a sound softer by a factor of 2 (halving the power) means decreasing it by 10 log(2) = 3dB. For sound pressure (dB SPL), a change by a factor of 2 means a change by 20 log(2) = 6dB. Our loudness perception differs in relation to frequency (Roads 1996, p.38; Collins 2010, p.8), but by

Figure 12. a) The waveform in Rogue Amoeba’s Fission only shows positive and negative peak values. b) Audacity (by Roger Dannenberg) displays a waveform that shows both peak and RMS.

approximation an increase of roughly 10dB SPL is perceived to be twice as loud. Waveform displays are used for a great variety of purposes, and depending on the importance of speed, accuracy and aesthetics, different approaches are taken in transforming sample values to a visual graph. Surprisingly often the bottom half of the waveform data (below zero) is discarded and replaced by a mirror of the top half, just for nicer appearance. When accuracy is of less importance than visual quality, the rough contour of the waveform might suffice, and one might even opt to leave out the bottom half all together.

Certain DJ applications need to emphasise visually beats so that the DJ can visually cue to specific points to prepare a transition. One way to accomplish this is to take the n’th power (e.g. the fourth) of the amplitude value, which will emphasise the difference between lower and higher amplitudes, hence accentuating peaks. The popular DJ software Traktor (by Native Instruments) takes the waveform a step further by colouring it with extra analysis data. This can be a spectral analysis, from which a spectral centroid or spread37_can be deduced. By using a colour gradient to discriminate high values from low values, a waveform can be coloured in, displaying for instance high spectral density with a bright colour, and low density with a darker colour. Sonic Visualizer38, a true audio analysis application, goes even further by allowing

the user to overlay different analysis visualisations, making it possible to see a wide range of sonic parameters and characteristics in a single view, on a single timeline.

In document Redefining the audio editor. (Page 41-43)

3. Redefining the Audio Editor

3.3 Representation &amp; Manipulation

3.3.1 The Waveform

3.3 Representation & Manipulation