Editing Using The Spectrum Display

3. Redefining the Audio Editor

3.3 Representation & Manipulation

3.3.3 Reading Waveforms And Spectra

3.3.4.2 Editing Using The Spectrum Display

Spectral Selections

A spectrogram is a two-dimensional image on a time axis and a frequency axis. As with a waveform, a selection in time can be made, but now a specific frequency range can also be specified. Using a rectangular selection tool (in Photoshop called a marquee tool) a portion of the two-dimensional image can be selected that delineates both a timeframe and frequency range. Another commonly implemented and more precise tool is the free selection tool or lasso. With it a specific region can be drawn, allowing the user to trace a portion of the spectrum, varying over time. This is particularly useful for selecting partials that change over frequency, or small bits of noise between other components, without also selecting and affecting (see spectral editing further on) the directly surrounding spectral components. Adobe, creator of both Photoshop and Audition, implements another selection tool called paintbrush, allowing the user to paint the selection using a round brush tool of a specified width, which results in a similar time-varying frequency selection. AudioSculpt also includes a Magic Wand tool that allows the user to click anywhere on the spectrum display, automatically selecting contiguous spectral bins within a predefined [47] In a touch user interface, alternate control can be realised by using two or more fingers instead of just one.

range in dB around the clicked bin. This makes it easy to for instance select a frequency-modulated partial, or a cloud of noise, which can then be moved or otherwise manipulated (Bogaards 2008).

Spectral Editing

The ability to select both a segment in time and in frequency allows for effects to be applied to just the specified time frame, and to just the specified frequency range. This can be as simple as amplifying or attenuating partials or noise elements, but any effect that can be applied to a standard time selection can also be applied to a time-frequency selection. Think of reverberating only the higher frequencies of a voice, or compressing specific components of a recording. Some processes do make less sense, such as high-pass-filtering only high frequency components.

To continue with Audition, another tool implemented is the spot healing tool (figure 23), which exactly resembles an image editing tool from Photoshop, to draw over a time-frequency region that will be levelled with the surrounding spectral content. This is ideal for removing sharp noises on a slightly noisy background, not removing all the noise but just the sharp aspect. Audition also provides a gain adjustment control directly above the selection, which makes amplifying or attenuating selections easy.

Editing by directly manipulating the selection becomes more advanced now there are two axes along which audio can be affected. In a two-dimensional space, we have the following transformations at our disposal: move, stretch, scale, skew, distort and flip. Photoshop does offer more transformations (rotate, warp and perspective), but these are less meaningful in audio editing.

• Move

Moving along the time axis equals the aforementioned repositioning, though now it can be frequency-independent. Moving along the frequency axis will shift or transpose the frequencies, depending on the selected scale (respectively linear and logarithmic). While shifting moves all frequencies Figure 23. Spot healing in Adobe Audition. a) shows a small but audible noise element amidst background noise. Spot healing allows selection and automatic removal of this element, filling it up with similar background noise.

a same amount, transposing moves all frequencies a same ratio. The latter will preserve the harmonic structure of for instance an instrument and the resultant will hence be more recognisable to the original than when frequencies are shifted48_.

• Stretch

Stretching up along the frequency axis on a linear scale will effectively be the same as transposing, moving the frequencies up a same factor. On a logarithmic scale the harmonic structure will be stretched out in the higher frequencies. Stretching downwards will have the same effect on the lower frequencies, while preserving the harmonic structure within the higher frequencies. Stretching along the time axis accounts to frequency-independent time stretching. This time stretching can be made content-aware. This is interesting, considering the onset problem. If one were to simply stretch out a single recorded trumpet tone, the most important part of that recording for us to recognise it as a trumpet, the onset, will be stretched out as well. The short burst of noise parts and inharmonicity that forms the onset of the played tone is what guides us in recognising the instrument (Howard and Angus 2001, p.213). This goes for

trumpets as well as violins and practically most instruments that can be used to produce sustained instrumental tones. Stretched far enough, the onset and thus the instrument will no longer be recognisable. The onset is followed by a more harmonic and less varying spectrum which dictates the timbre, which is more suitable for time stretching.

• Scale

Scaling is stretching along both axes. An interesting

examination of tape-speed variation can be made here. When playing back audio twice as fast, the audio is twice as short, and all wavelengths are halved making the audio sound twice as high. This then effectively scales down along the time axis (twice as short) and up along the frequency axis (twice as high). The inverse is true when playing back audio twice as slow; the audio becomes twice as long, and sounds twice as low. When stretching along both axes, any variation and deviation on this process can be made, including time-preserving pitch shifting and pitch-preserving time stretching, or anything in-between.

[48] It must be noted that in this case the formant structure is lost. A formant is a peak of energy in the spectrum, which can include both harmonic and inharmonic partials as well as noise (Roads 1996, p.296). The formant structure can be regarded as the filter that defines the character of a voice (particularly the vowels) or a musical instrument. A pitch-shifted voice will not be recognised as originating from the same person.

• Skew

Skewing allows one side of a selection to remain fixed while the opposite side is shifted. Along the frequency axis this allows for applying a gradual pitch shift over time. Along the time axis a very interesting effect called

spectral delay can be introduced, meaning components of

the selected audio are individually delayed, which can for instance result in having the lowest frequency component played first, followed progressively by the higher components.

• Distort

Distorting the selection means corners can freely be moved. It offers simultaneous time stretching, spectral delaying and time-varying pitch shifting, making it an extremely powerful tool.

• Flip

Flipping spectral content along the time axis will effectively reverse the selected audio. Along the frequency axis the results will be obscure, as recognising audio depends heavily on relationship between the lower and higher frequency components.

These more artistic transformations are rarely found in current audio editing software. The ability to amplify, attenuate and remove spectral content is found in various applications, and is particularly useful for audio restoration and cleaning. One might also find use of such software outside the music industry, for instance in forensics, where spectral editing is used to isolate and enhance certain parts of audio recordings.

In document Redefining the audio editor. (Page 53-56)

3. Redefining the Audio Editor

3.3 Representation &amp; Manipulation

3.3.3 Reading Waveforms And Spectra

3.3.4.2 Editing Using The Spectrum Display

3.3 Representation & Manipulation