2.7 Software tools for spectral modelling and
2.7.1 Specialised systems
A wide variety of specialised tools for spectral processing of musical sounds can be found in the literature, generally focusing on analysis, transformation and syn- thesis using a particular sinusoidal modelling algorithm. Here we survey of some currently available software packages, their implementation details and the sound transformations that they provide.
SNDAN
SNDAN [8] is a free suite of C programs distributed as source code6, allowing for
the analysis, modification and synthesis of (primarily monophonic) musical sounds. Tones can be analysed using either phase vocoder analysis or by using an implemen- tation of MQ sinusoidal modelling. Analysis data is then synthesised using additive synthesis. Programs for viewing plots of analysis data are also included in SNDAN. A number of transformations can be applied to the analysis data, including sev- eral different sinusoidal amplitude and frequency modifications, smoothing sinu- soidal amplitudes and frequencies over time, smoothing sinusoidal amplitudes ver- sus frequency and time stretching the input waveform. Analysis data can also be saved to a SNDAN file format and then reloaded for synthesis, so it is possible for other applications to apply modifications.
ATS
ATS [94] is an open source library for spectral analysis, transformation and synthe- sis of sound. It is written in Common Lisp and is designed to work in conjunction
with Common Lisp Music [106]. The ATS analysis process is similar to the SMS algorithm. Spectral components are identified by peak detection and partial track- ing algorithms and then used to synthesise a deterministic signal. This harmonic component then subtracted from the original sound and the residual signal is used to create a stochastic component. However, the ATS partial tracking process also allows peaks to be ignored if they are considered to be inaudible due to psychoa- coustic masking effects within critical bands. Analysis data is stored in lisp abstrac- tions called sounds, and can be synthesised using a variety of techniques including additive synthesis, subtractive synthesis and granular synthesis. Available transfor- mations include scaling partial amplitudes and frequencies by constant values or dynamic envelopes, transposition (with or without maintaining formant shapes) and time stretching. A graphical user interface (GUI) is also provided that enables the real-time control of synthesis parameters.
SPEAR
SPEAR [59] is a cross-platform graphical spectral analysis, editing and synthesis tool. Sinusoidal analysis is based on an extension of the MQ method. Like SMS, peak frequency estimates are improved by using parabolic interpolation. The partial tracking algorithm selects peaks by using linear prediction to estimate the future trajectory of the partial, and peaks are then selected based on how closely their parameters match the estimates [62]. Synthesis of analysis data is performed using either the inverse FFT method or by using additive synthesis of banks of sinusoidal oscillators. As one of the design goals of SPEAR was to enable integration with other sinusoidal modelling implementations, a wide variety of analysis data file types can be imported and exported. SDIF [125] and plain text files can be both
imported and exported, and SPEAR can additionally import both SNDAN and ATS analysis files. Using the graphical editor, sinusoidal components can be cut, copied and pasted, as well as being shifted in frequency and stretched or compressed in time.
Libsms
Libsms is an open source library that provides an implementation of SMS, derived from Serra’s original SMS code [32]. It is written in C, uses SWIG [9] to provide bindings for Python, and is also available as a set of external objects for Pure Data. Available SMS transformations include pitch-shifting (with and without the preser- vation of the original spectral envelope), time-stretching and independent control of the volumes of deterministic and stochastic components. Analysis data can also be imported from and exported to custom SMS analysis files.
Loris
Loris is an open source C++ implementation of the bandwidth-enhanced sinusoidal modelling system. Python bindings are provided using SWIG, and Loris can also be built as a Csound opcode (plugin). Time-scaling and pitch-shifting modifications can be performed on the analysis data, but sound morphing is of particular interest to the developers so a range of functions are provided for performing morphs between two sound sources. Loris also supports both importing and exporting analysis data via SDIF files.
AudioSculpt
AudioSculpt [14] is a commercial software package for spectral analysis and pro- cessing of sound files that is developed by IRCAM. It has a GUI that displays multiple representations of a source sound (a waveform view, a spectrum view and a sonagram), which are used to apply audio transformations to specific time or fre- quency regions. AudioSculpt relies on two different signal processing kernels in order to manipulate sounds. The first is an extended version of the phase vocoder called SuperVP [27]. It can be used to perform several different analyses includ- ing the computation of the standard and reassigned spectrograms, estimation of the spectral envelope by linear predictive coding and the true envelope, transient detec- tion and fundamental frequency estimation. The second sound processing kernel, called Pm2, uses a proprietary sinusoidal modelling implementation. The sinusoidal partials that are estimated by the model can be exported to SDIF files.
AudioSculpt allows multiple transformations to be performed on sound files, including manipulating the gain of certain frequency regions that are selected using the GUI, transposing sounds (with or without time-correction), time-stretching, a spectral “freeze” effect, and a non-linear dynamic range stretching effect that is known as “clipping”. A sequencer is also provided so that transformations can be applied at specific time points in a sound file. Sound files can be processed and played back in real-time, allowing the results to be heard before saving them to a new sound file. However, AudioSculpt does not support the real-time processing of live audio streams.
It is possible to use the SuperVP sound processing kernel to manipulate audio streams in real-time by using the set of SuperVP external objects for Max/MSP
[52]. The supervp.trans~ object can be used to perform transposition, spectral envelope manipulation and decomposition of a signal into a combination of sinu- soids, noise and transient components. Spectral envelopes and transients can also be preserved after modification. The supervp.sourcefilter~ object enables the spectral envelope of one signal to be “imprinted” onto another signal. An alternative cross-synthesis object called supervp.cross~ can be used to mix the amplitudes and phases of two signals in the frequency domain.