A gesture is a motion of the body that contains information. (Kurtenbach & Hulteen, 1990, p. 310)
The concept of gesture is frequently presented in studies of musical per- formance and is often considered to be central to the New Interfaces for
4.1. Tracking Gestures
Figure 4.1: An iPad septet performing with the Metatone Classifier EDA and the Singing Bowls app. The lower plot shows the agent’s classifications of the ensemble’s gestures over the whole performance. The x-axis scale the time of the performance in 24-hour format (hh:mm) and the y-axis shows gesture states that are explained below in Table 4.2. Moments of peak change that triggered “new-idea” responses are marked with vertical red lines.
Musical Expression (NIME) field. While the above quote from Kurtenbach and Hulteen (1990) is a succinct expression of the concept from an HCI perspective, “musical gestures” can refer to several different concepts, even within the proceedings of the NIME conference (Jensenius, 2014). Cadoz and Wanderley (2000) described two kinds of musical gesture in a survey of
the term’s use in HCI and music: Effective gestures are those that are used
to control a musical instrument, whileancillary gestures are not involved in
creating sound; rather, they are used to communicate to other musicians or simply to emphasise the unfolding music. A more abstract meaning of mu- sical gesture that does not fit into the typical HCI understanding is to refer to “motion-like qualities in the perceived sound,” or even in the musical
instructions of a score (Jensenius, 2014, p. 218).
Ancillary musical gestures can be captured and harnessed as an extra dimension of computer musical control, as demonstrated by Caramiaux et
al. (2012) for clarinet, and for percussion with theRadio Drum instrument
(Schloss, 1990) or with computer vision methods (Lai, 2009). Effective gestures can also be tracked by sensors attached to instruments, such as bow sensors (Young, 2002), brass valve sensors (L. Jenkins et al., 2013), or the piezoelectric pickups of electronic drums (Tindale, 2007). For the touch-screen mobile devices in the present research, a large amount of data about the performers’ effective gestures could readily be collected from the touch-screens, so extra sensors were not necessary.
Researchers in the NIME field have suggested that it is often easier to collect gestural data than it is to interpret and respond to it musically (P. Cook, 2001). An important recent trend has been to apply powerful Machine Learning (ML) algorithms to such problems so that many dimen- sions of sensor data may be mapped to much simpler continuous or discrete changes in the synthesis output of a performance interface. For example, Fels and Hinton (1995) used neural networks to map multiple hand sen- sors to a speech synthesiser. Caramiaux and Tanaka (2013) have provided an overview of machine learning from a DMI designer’s perspective, distin- guishing between regression and classification tasks, and reviewing available tool-kits. Fiebrink et al. (2009) used the WEKA machine learning toolkit (Garner, 1995) to create the Wekinator system, designed to allow DMI de- signers to quickly train ML processes with examples of gestures, map the output to a synthesis environments or other musical software, and evaluate the results on-the-fly through performance. Other tool-kits and libraries have emerged that integrate with computer music environments, such as
the SARC EyesWeb Catalogue (Gillian et al., 2011), ml.lib (Bullock &
Momeni, 2015), and a library by B. D. Smith and Garnett (2012). An- other approach to tracking performances is to extract features from live audio streams, as was done by Hsu (2007) to track improvisations by live instruments.
4.1. Tracking Gestures
Figure 4.2: An excerpt from Burtner’s (2011)Syntax of Snow for solo glockenspiel and bowl of amplified snow. The composer defines a vocabulary of gestures for interacting with the snow with one hand represented by symbols below a regular staff for notes on the glockenspiel. (Score excerpt cM. Burtner 2010, reproduced with permission.)
used ML methods to track the musical gestures of individual performers and to map them to synthesised sonic responses. In the mobile DMIs presented in Chapter 3, the individual sound synthesis responses had already been mapped using existing touch-screen tracking methods provided by the iOS operating system. In this chapter a system will be presented that classifies the gestures of a mobile-music ensemble simultaneously and continuously analyses the whole ensemble’s behaviour. One approach for analysing per- former behaviour is to construct transition matrices of changes between a set of musical states that characterise that performance. This approach was first described by Swift et al. (2014) in their analysis of live coding proto- cols. In the present work, this transition matrix approach will be further developed for real-time gestural analysis of touch-screen ensembles.
4.1.1 Characterising Percussive Gesture
While traditional musical notation specifies sonic outcomes — pitch, articu- lation and rhythm — it is possible to compose music by specifying gestures used for interacting with instruments. For percussionists, where gestures are transported across a variety of instruments, this has been used to no-
tate music performed on unconventional objects. de Mey’s (1987)Music de Tablesis written for three percussionists who perform on the surfaces of reg- ular tables; here, de Mey defines a vocabulary of notation for gestures that are used with standard rhythmic notation in the score. Burtner’s (2011)
Syntax of Snow asks the solo performer to play a glockenspiel with one hand and a bowl of snow with the other. The score sets out a complex scheme of gestures for “playing” the snow, with a pair of symbols (see Figure 4.2) for each gesture, representing the type of gesture as well as hand position in the bowl. Some of the gestures in this score (e.g., “touch with finger”, “swish with palm”, “draw line”) could generalise to other instruments and to touch-screens.
It is notable that many of the gestures indicated in Burtner’s (2011) score could be interpreted as being continuous rather than ceasing after following the instruction. For example, “fingers tapping” should probably be interpreted not as one or two taps but as a continual tapping until the performer reaches the next instruction. In HCI research, gestures on touch- screens are frequently characterised as having a short and finite expression such as the “unistroke” gestures described by Wobbrock et al. (2007). These gestures are usually designed to execute a command in software (e.g., double tap to open a menu) rather than to create an artistic expression. For this reason, characterisations of touch gestures that already exist in the HCI literature are unsuitable for characterising performative touch gestures that mainly consist of continuous interactions.
In Chapter 3, a vocabulary of continuous gestures was identified that was used by expert percussionists on the MetaTravels and MetaLonsdale iPad interfaces. These results have been used to construct an agent that observes performers’ touch-screen interactions in real-time and classifies them as a sequence of gestural states. Free-improvised ensemble musical performances can be considered as sequences of musical sections segmented by moments where the group spontaneously moves to explore a new musical
idea (Stenstr¨om, 2009, pp. 58–59). The agent estimates the occurrence
of new musical ideas across the ensemble by calculating a measure, flux, on the transition matrix of these gesture states. The gestural states and