CHAPTER 4 | EXPERIMENTAL METHOD
4.10 Time instants and base tempo
The chosen performance characteristics must be measured at each such instant. The granularity of measurement is both high enough to balance ease of identification and low enough to provide useful data.
The feature vector of discrete values, one per variable specified at a time-instant t (accepting that music is a continuous auditory phenomenon) is an intensity contour, ICt.
In order to facilitate identification, we might label time instants with bar number and beat. So, for example, T5:1 is the Tempo measured at beat 1 of bar 5. This identification
approach has two advantages: it provides for instant association of the value t with a specific location in the score irrespective of varying time-signatures. An intensity contour, ICt, for two performance variables might be defined as follows where A is the
103
ICt = {At, Dt:}
Fig. 4.4 Sample Intensity Contour (1)
This is the instantaneous rate of change of tempo: in mathematical terms this is the slope of a tempo curve at the measured point (it may have positive or negative values: positive meaning the rate of acceleration and negative meaning the rate of deceleration). D is the rate of change of dynamic level at time t. This is the instantaneous rate of change of loudness intensity (defined as dBspl and constrained to be within a range 0-
100).22
It is proposed that expressivity be represented by changes in ICs, as a performance proceeds - a multivariate map flowing in the time dimension. In principle, there is an infinite number of time-instants at which measurements might be taken. For large music datasets, it is possible to rely on the Central Limit Theorem to infer statistical
significance of results.23
A useful extension of Saxify would be to treat a subset of recorded performances as a training set, in the sense of training a machine learning algorithm.24 Assuming the
performance style parameters were consistent, it might then be possible to classify where additional, unknown, recordings lie on a timeline, or to retrieve performances that appeared to exhibit time-related characteristics. Although this PM is here restricted to taking measurements at time instants of acceleration and rate of change of dynamics, a typical ICt might be extended with many more variables, for example, where T is
Tempo (measured in Beats per Minute), F is Absolute Pitch (frequency), or more likely a basket of frequencies comprising the fundamental and some number of harmonic overtones, VD is Vibrato depth (e.g. cents, being percentages of a semitone), VF is Vibrato frequency (cycles per second measured in Hertz), and D is melody lead (measured in mS) either between the hands of a pianist, or between two, or more, instruments:
22 Each 20-unit increase in dBspl, being logarithmic, represents 10 times the loudness of the lower value.
By way of comparison, a trumpet at 0.5m may typically be measured at dBspl = 130. It is noteworthy that
there is an instantaneous risk of permanent hearing loss for a loudness measurement at the ear of dBspl =
120.
23 Hans Fischer, A History of the Central Limit Theorem (New York: Springer, 2011); and see Eric
Weisstein, ‘Central Limit Theorem’ http://mathworld.wolfram.com/CentralLimitTheorem.html, Accessed: 13 August 2013.
24 John Kelleher, Brian MacNamee and Aoife D’Arcy, Fundamentals of Machine Learning for Predictive
104
ICt = {Tt, Ft, VDt, VFt, Dt}
Fig. 4.5 Sample Intensity Contour (2)
Audio processing software, such as Sonic Visualiser and Audacity, can greatly improve the efficiency of processing audio data. This includes overlaying an audiogram (a graphic visualisation of an audio stream) with characteristic graphs. One example, in the time domain, is dynamic level which may be automatically mapped to beats. There is a large range of plugins, researched at both academic and commercial institutions, that can process audio signals. By using such software, a researcher may record and then re- work a tap track to map very accurately a time-varying performance tempo and anchor the times at which events will be located.
Several automated techniques exist to calculate beat onset. Such techniques were
researched but were found to be generally inadequate in dealing with polyphonic voices, dense musical figuration, or varying time signatures.25 The most effective technique was
found to be tapping the beat onsets by hand and then using automated tools to adjust data points as close as possible to their onset. Thus, the accuracy of note onset
measurements may be optimised by filtering them through Tapsnap which attempts to move tapped beats to the nearest identified onsets in the audio recording.26 The output
from this program may then be re-checked by ear for accuracy and re-adjusted as necessary. Tapsnap takes, as input data, a text file with event timings, such as measurement of tempo at each beat.
There are significant problems in comparing the dynamics of musical performances in absolute terms. Dynamic level is a function of the physical medium in which the sound is created, the mechanical technique of sound production by the performer,
psychoacoustic perception by the listener—each listener may perceive loudness differently, widely varying playback technologies, and physical characteristics of the
25 See VAMP audio plugins developed among others by researchers at the Centre for Digital Music,
Queen Mary, University of London (partially funded by the EPSRC through the OMRAS2 project EP/E017614/1, and partially funded by the European Commission through SIMAC project IST-FP6- 507142 and the EASAIER project IST-FP6-033902). There are also plugins developed at the BBC, University of Alicante, and by many others listed at https://www.vamp-plugins.org/download.html
105
space (generating reverberation)—dispersion characteristics of sound may impact on perceived loudness.