• No results found

2.1 A note on segmental and suprasegmental aspects of speech

2.1.3 Suprasegmentals

61

Segmental features of speech are produced and perceived in a different time frame

from other phonetic features, referred to as ‘suprasegmentals’. The term itself reveals that these features have to do with vocal effects that extend over more than a single

segment (Crystal, 2008). When one talks to a baby or an animal, they often use a

combination of suprasegmental features, as for example, changes in voice quality or

higher pitch register (Ogden, 2009). This means that these features have a role which

can be viewed beyond the lexical aspect of speech. Despite the fact that these

features do not convey literal meaning per se, their extralinguistic meanings may

contribute a great deal to communication. It is therefore worth examining these

features individually in order to appreciate how these are described acoustically, how

they are perceived auditorily and, in later sections, how these can relate to acoustic

streams of music.

Prosody, another term for suprasegmentals, encompasses timing, frequency,

amplitude, pausing, and voice quality. These are variables that mark all parts of a

spoken utterance and, according to the existing literature, contribute to the formation

of three main types of realisations: linguistic, pragmatic, and emotional prosody. In

general, researchers use the term ‘prosody’ to either refer to an abstract definition

without looking at any specific prosodic components or to examine closely the above

features of timing, frequency, amplitude etc. (Cutler et al., 1997).

Loudness as a suprasegmental feature can contribute to disambiguation of meaning

(in the case of heteronyms, as for example in the word contrast which can be either a

noun or a verb) and communication of emotions (Skandera and Burleigh, 2005).

Loudness can be relative in terms of perception. That is, other factors, co-occurring

with loudness, can affect how loudness is perceived. For example, the pitch of an

62

Length is also used to describe speech signals at the suprasegmental level. It refers to

the physical duration of a sound or an utterance and it normally differs from duration

that pertains to the time devoted to the articulation of a sound or a syllable at the

segmental level (Crystal, 2008). Tempo refers to the speech rate of an utterance and

differences at this level have a different function compared to segmental

manipulations of duration. An important point is that differences in tempo do not

produce differences in meaning equivalent to the differences that duration can bring

about at the word level (Lehiste, 1970).

Another element that is taken into consideration for prosodic investigations of

speech is ‘voice quality’. In simple terms, voice quality is defined as the difference in ‘colour’ that one perceives among different voices and resembles the difference that one perceives when they are exposed to two identical notes played in equal

loudness by two different instruments (Skandera and Burleigh, 2005). In this sense it can be alternatively called ‘timbre’. A breathy voice and a harsh voice can be

perceived as different even when they display the same fundamental frequency and

loudness in the same sense that piano and violin timbre differences are perceived.

However, in a broader definition of voice quality, the term refers to a series of

features. It can refer to someone’s rate of speech, pitch height, loudness and timbre (Crystal, 2008) rather than timbre exclusively. In this second definition, the term

seems to encompass all these features that one can have at their disposal in order to

identify a speaker.

Pitch is an important auditory feature which has received a lot of scientific interest

and has also been the focus of research in many studies in the comparative

63

feature of frequency. The frequency of the vibration of the vocal folds determines the

auditory result that one perceives (Skandera and Burleigh, 2005). That is, a fast

vibration of the vocal folds results in a higher pitch, whereas a slower vibration

results in a lower pitch. In contrast to most sounds that surround us and which are

called complex sounds, there are also pure tones that differ from complex sounds in

that they contain only one frequency (Griffiths et al., 1999). By contrast, the sound

resulting from the vibration of the vocal folds is an example of complex sound with

many associated harmonics (Lehiste, 1970). Pitch and fundamental frequency do not

relate in a linear way. Fundamental frequency is defined as the frequency of a

periodic (regularly repeating) sound that corresponds to the lowest mode of vibration

and harmonics as whole number multiples of this frequency (Zatorre et al., 2007).

Ogden (2009) explains that the relationship between fundamental frequency and its

percept, pitch, is of logarithmic nature. More specifically, if this relationship was

absolute, then the difference between 100 Hz and 200 Hz would be equal to the

difference between 200 Hz and 300 Hz. Rather, fundamental frequency and the

stimuli we perceive relate in a proportional fashion. That is, the difference between

100 Hz and 200 Hz is similar to that between 200 Hz and 400 Hz, meaning that the

two stimuli have the same difference in proportion; 1:2 in both cases.

In simple auditory terms, pitch can be defined as the type of sensation scaled from

‘low’ to ‘high’ (Crystal, 2008). As also noted in 1.3, there is a distinction between absolute and relative pitch. Relative pitch requires the listener to abstract intervallic

relationships (De Cheveigne, 2005). That is, the listener has a point of reference at

their disposal and they base their judgement on the relationship between two notes

rather than the identification of a note out of melodic context. Absolute pitch or

64

of reference, that is, to name an isolated note. The use of pitch in prosodic functions

is discussed later in this chapter.