Vocal Dosimetry: Theoretical and Practical Issues
Jan G. Švec
1, Ingo R. Titze
1,2and Peter S. Popolo
11 National Center for Voice and Speech,
The Denver Center for the Performing Arts, 1245 Champa Street, Denver, CO 80204, USA, e-mail: [email protected]; [email protected]; [email protected]
2 Department of Speech Pathology and Audiology
The University of Iowa, 330-WJSHC, Iowa City, IA 52242, USA e-mail: [email protected]
In order to quantify the amount of vocal fold vibration produced over a long periods of phonation, measures called ‘vocal doses’ have been introduced. The three most important vocal doses so far are the time dose (i.e., the voicing time), the cycle dose (i.e., the accumulated number of oscillations of the vocal folds) and the distance dose (i.e., the accumulated distance traveled by the vocal folds in a vibratory trajectory). The dose of particular interest is the distance dose since there exists an industrial safety criterion for hand-transmitted vibration, which allows only ca. 520 m of accumulated vibration distance per day. If the process of tissue damage through vibration is similar in hands and in vocal folds, the information on the distance traveled by the vocal folds could potentially be used to derive safety criteria for vocalization. A set of empirical rules was derived to obtain the distance dose from the voicing time, fundamental frequency and the SPL of voice [14;16]. A Pocket-PC based voice dosimeter has been developed which records and stores information on the F0 and intensity of phonation in 30 ms intervals for the whole day. An accelerometer
attached to the neck is used as the voice detector and the intensity of the vibration at the neck (skin acceleration level, SAL) is related to the SPL of voice using a calibration procedure. The recorded data can be downloaded from the Pocket PC to obtain vocal doses and used for further analysis.
Keywords: vocal dose measurement, vocal dosimeter, voice accumulator, vocal loading
1. Introduction
The problem of long-term measurement of voice has received much attention in the last decades and its importance is growing especially due to recent efforts of establishing occupational safety criteria of vocalization [17;18]. One of the most fundamental issues when studying the effects of excessive or long-term vocalization is determination of the proper way of quantifying the amount of voicing. Measures called ‘vocal doses’ have recently been proposed for this purpose [14;16]. The term ‘dose’ has been traditionally been used in fields dealing with influence of factors such as radiation or chemicals on biological tissue. Since occupational voice problems are expected to be caused by long-term exposure of the vocal fold tissue to vibration, the term ‘vocal dosimetry’ has been adopted and used alternately to the more traditional term ‘vocal accumulation’.
AQL 2003 Hamburg: Proceeding Papers for the Conference Advances in Quantitative Laryngology, Voice and Speech Research(CD ROM, ISBN: 3-8167-6285-9, http://www.uke.uni-hamburg.de/AQL2003).
2. Vocal Dose Measures
The three potentially most relevant vocal doses identified so far are the time dose, the cycle dose and the distance dose. The time dose is equal to the voicing time [5;6;8;11;13;15] and measures the total time the vocal folds spent vibrating. It is defined as [16]:
∫
= p t O v t k D dt seconds (1)where tp is the total performance time in seconds and kv is the voicing unit step function:
kv =10 for nonfor voicing−voicing. (2) The cycle dose measures the total number of cycles accomplished by the vocal folds and is defined as:
∫
= p t O O v c k F D dt cycles (3)where F0 is the fundamental frequency of the vocal fold oscillation in Hz. The cycle dose is
practically identical to the ‘vocal loading index (VLI)’ used by Rantala and Vilkman [10], the only difference being the vocal loading index measuring the number of cycles in the units of thousands.
The distance dose measures the total distance traveled by the vocal folds on their oscillatory trajectory. It is defined as [16]
meters dt AF k D p t O O v d =4
∫
(4)where A is amplitude of the vocal folds. The definition reveals that this dose is sensitive to both frequency of oscillation of the vocal folds as well as the vocal intensity (since the amplitude of the vocal folds changes with vocal intensity).
The distance dose is of special interest because it can be related to the safety criteria used in industry for hand-transmitted vibrations. The safety limit for the tissue of hands is about 520 meters of accumulated distance; exposure to vibrations exceeding this amount is considered to be hazardous [16]. The vocal folds can easily travel the distance of 520 meters in less than 45 minutes of continuous speech, however [14;16], which suggests that the safety limit for the vocal folds should be larger than for hands, especially since the tissue of the vocal folds is better adapted for vibration than the tissue of hands. The difference between the tissues of hands and vocal folds in terms of their ability to withstand vibration (which would allow to determine the safety limit for vocalization) remains to be specified, however.
A practical problem related to the determination of the distance dose is the measurement of the amplitude of vibration of the vocal folds. So far, no device has been available to measure the amplitude of the vocal folds in running speech. The authors have overcome this problem by employing empirical rules, allowing approximation of the vocal fold amplitude from the SPL and F0 of voice [14;16].
3. Vocal Dosimeter
3.1. Design
Various devices have been used by different researchers to measure the amount of vocalization: noise exposure analyzers [1], portable tape recorders [4;13;15], and specially constructed ‘vocal accumulators’ [3;8;12;15]. A special ‘voice dosimeter’ was developed by the authors at the National Center for Voice and Speech [9]. The dosimeter was designed to fulfill the following needs:
- simplicity of use, so that it can be used by technically ‘naïve’ subjects (e.g., school teachers)
- insensitivity to background noise and speech of other subjects
- ergonomic design allowing the device to be worn for 14 whole consecutive days - whole-day recording duration (at least 14 hours). The whole day instead of only the
working hours was targeted in order to assess the vocal use also during free time and to determine whether the main voice load comes from the teaching activities or from other activities (such as, e.g., shouting at sport events, choir practice, excessive voice use at loud restaurants, bars, etc.)
- simultaneous measurement and storage of F0 and SPL of voice every 30 ms (the 30 ms
duration was chosen since it allows detection of inter-syllabic pauses in speech) - information on absolute time of the recordings (allowing to specify times of the day at
which the recorded vocal activities happened)
- collection of information on the voice quality, speaking effort and laryngeal discomfort which can be related to the measured vocal doses
The heart of the device consists of a commercially available programmable Pocket-PC (Compaq iPAQ, model 3765). A number of technical issues had to be overcome, the main drawbacks being the built-in microphone, which was found unsuitable for long-term vocal dose measurement, and the absence of an external microphone input. Rewiring the Pocket PC and using the input of the internal microphone as an input for an external voice sensor solved this problem.
3.2. Voice sensor
An accelerometer (BU-7135 by Knowles Electronics) was chosen as the voice sensor. The accelerometer is attached to the neck at the jugular notch (anterior part of the neck, below the larynx - between the cricoid cartilage and the sternum) using a surgical adhesive (Mastisol® by Ferndale Laboratories, applied between the skin and the accelerometer) and a Suture-Strip (TS-3101 by Derma Sciences, applied over the body of the accelerometer, further securing it to the skin). The advantage of the accelerometer over a microphone is that, since it only records vibration from the surface it is in direct contact with, it is virtually insensitive to any sound signals except those produced by the subject to which it is attached. Also, the accelerometer is very small and comfortable to wear (if a sufficiently flexible cable is used). The accelerometer was found capable of recording the total dynamic range of the vocalizations over the complete frequency range. Certain problems have been experienced
3.3. Calibration for voice intensity measurement
In order to be able to measure the vocal intensity by the accelerometer, a calibration procedure was designed which allows finding the relationship between the skin acceleration level (SAL) measured by the accelerometer on the neck and the absolute sound pressure level (SPL) of voice at 50 cm distance measured by the sound level meter. The SPL/SAL relationship is being determined in the laboratory for every subject individually. Soft, normal and loud speech is analyzed for this purpose and the resulting SPL/SAL relationship is found as the best fit for all the speech samples [9].
3.4. Voicing detection
The signal is registered as voiced if the SAL value is greater than the threshold. The threshold SAL is also determined individually for each subject. The subject is asked to perform the softest phonation possible and the SAL level registered by the dosimeter during this phonation is measured. Then, the SAL threshold is set to be, ideally, 5 dB lower than the SAL of the softest phonation and, ideally, at least 5dB above the internal noise level of the dosimeter. If the softest SAL is less than 10 dB above the noise level of the system, the SAL threshold is set as a middle value between the noise level and the softest SAL. The dosimeter also includes algorithm for identification of artifact signals resulting from non-vocalization activities, such as swallowing, head movement, etc., during which the SAL could exceed the threshold. If most of the energy is outside an expected vocal range, the signal is identified as unvoiced even if the SAL level exceeds the threshold.
3.5. Signal processing
A custom software application was written for the Pocket PC that determines SAL [dB] and F0 [Hz] values in real time, every 30 ms. In order to have information also on the
spectrum of the signal, the so called “frequency energy center (FEC)” [Hz] is calculated using a formula for the center of gravity of the spectrum [7] and stored in addition to the SAL and F0. In case of purely sinusoidal signal the FEC is identical to the F0; for signals containing
harmonic components, the FEC is expected to be higher than the F0. The stronger the
harmonic components are (i.e., the flatter the slope of the spectrum), the larger is the difference between the FEC and the F0. The FEC value thus provides information on the
voice quality. The SAL, F0 and FEC values are stored as 8-bit integer numbers. The SAL
resolution is 1 dB, F0 and FEC resolution is 10 Hz. Fifteen hours of recorded data result in a
file size of about 5 MB, which is comfortably stored in the Pocket PC (32 MB limit).
3.6. Evaluation of voice quality, speaking effort and laryngeal discomfort
In addition to registering the voice signal, every two hours the subject is prompted to perform self-evaluation of the quality of soft voice (on a 1-10 scale, 1 being the best and 10 the worse quality), speaking effort level (1-10 scale; 1 for no effort, 10 for an extreme effort to speak), laryngeal discomfort (1-10 scale; 1 for no discomfort, 10 for extreme discomfort) and place of the discomfort (1-4 scale; 1-outside the larynx, 2 inside the larynx, 3 – both inside and outside, 4 - neither). The evaluation results are entered in the dosimeter manually by the subject via a graphical user interface and are stored in a separate file.
3.7. Internal dosimeter tests
When performing the self-evaluation by the subjects, the dosimeter also performs two tests that allow monitoring of the functioning of the device. The first test, the noise test, is done to determine the noise level of the system when there is no vocalization. The second test, the counting test, is done to determine whether the voicing signal is strong enough (thus allowing to discover potential problems with detachment of the accelerometer from the skin). In case of unexpected SAL values during these tests, the subject is advised to check the cables and the attachment of the accelerometer.
4. Dosimeter Results
After the end of the recording, the data are transferred from the Pocket PC to a desktop or laptop PC. The extraction and processing of the recorded data is done in Matlab. Here, the measured SAL levels are converted to the SPL levels in accordance with the previously determined SAL/SPL relationship and the resulting vocal doses are calculated. Also, for the purpose of displaying the results, means SAL, SPL, F0 and FEC values as well as
the three vocal doses are calculated for every minute of phonation.
Figure 1 shows the data measured by the vocal dosimeter during one of the regular tests. The subject measured was a 39-years-old male working as an administrator at the National Center for Voice and Speech in Denver who is simultaneously a singer and singing teacher. At the beginning (the interval between 5-15 seconds) the subject was quiet and the dosimeter measured the noise level, which was 59 dB in this case. Then the subject performed a series of phonation tasks for self-evaluation of the quality of soft voice [2]:
sustained soft /i/ at upper pitch (15-21 s), soft up-and-down pitch glide (22-26 s), high-pitched staccato /i-i-i-i-i/ (27-29 s), and soft high-pitched singing of the first two verses of ‘Happy Birthday’ (30-39 s). Around 53-57 s, the subjects performed the counting test (saying “one-two-three”) at comfortable intensity during which the maximal SAL level was determined (83 dB).
Figure 2 displays the dosimeter data from the whole day. It can be seen in the graph that the recording started at the time of 8 hours 8 minutes and 19 seconds in the morning and ended at 18 hours 24 minutes and 2 seconds in the afternoon. Notice especially the data in the middle graph displaying the doses per minute, which reveal the amount of voice activity performed by the subject at given times. Considerable voice activity is seen shortly after the start of the recording (ca 8:15 – 8:30), during which the subject practiced singing. Hardly any voice activity is recorded between 8:45-9:15 at which time the subject traveled to work. Then, continuous vocal activity is seen between 9:15 and 12:05 during which time the subject participated in an administrative meeting. After the meeting the subject walked to a restaurant in town, the transfer took about 15 minutes (very low voice activity between 12:15 – 12:30). At the restaurant, the subject had a lunch meeting with a close friend, most of which time was spent in discussion (large vocal doses between 12:30 and 13:30). As the restaurant was quite noisy, most of the phonations were considerably loud (reflected in the noticeably increased SAL, F0 and FEC values in the two upper plots of Fig.2). For the rest of
the day the subject was doing administrative work in his office (generally low vocal doses
Figure 2: Data from 12 hours of dosimeter use. Top plot: mean skin acceleration levels (SAL) per
minute. Second plot from top: mean fundamental frequencies per minute and centers of the total spectral energy radiated each minute. Third plot from top: doses per minute (cycle dose, time dose and distance dose). Fourth plot from top: the ratings of the soft voice quality, effort and discomfort self-evaluated by the subject. Bottom plot: the accumulating vocal doses (curves) and the total values of the doses (text). VLI – vocal loading index (i.e., cycle dose), Dt – time dose, Dd – distance dose.
between 13:30 and 17:30), with occasional phone-calls (occasional increases of the doses during that time). Between 17:30 and 18:00 the subject traveled home from work (no voice activity) and took the dosimeter off at 18:24.
The subject performed three self-evaluation voice tests during the day; the results of the evaluation are shown in the fourth plot of Fig.2. The quality of the soft voice was worse (evaluated as level 5 on the 1-10 scale) in the middle of the day at around 14:00 (after the visit of the noisy restaurant) than in the morning (level 3) and evening (level 2). The speaking effort did not change during the day (level 2). Slight laryngeal discomfort (level 2) was perceived at 14:00, no discomfort was perceived in the morning and evening (level 1).
The bottom plot of Fig.2 shows the cumulative vocal doses. The dose values accumulate considerably especially in the first part of the day during which there was the most of the vocal activity. The time dose accumulated for the 12-hours of recording was measured to be 5082 seconds, which corresponds to about 87 minutes of voicing. The voicing percentage was close to 14%. The dosimeter measured 722208 (almost 3/4 of a million) oscillatory cycles accomplished by the vocal folds during the day and the total distance of 2893 meters traveled by the vocal folds on their oscillatory trajectory.
5. Discussion and conclusion
The time, cycle and distance doses provide means of quantifying the amount of vocalization over a long-time. The time dose proves to be useful for quantifying the duration of voicing and the voicing percentages and comparing them among various vocal activities, occupations as well as individual subjects [5;6;8;11;13;15]. It can also be used as a normalization factor to obtain doses per second of vocalization [16].
The cycle dose has been found relevant by Rantala and Vilkman [10], who reported that the teachers with more vocal complaints showed higher values of vocal loading index than the teachers with less vocal complaints. The number of oscillatory cycles accomplished by the vocal folds is enormous; our preliminary data reveal that it is often more than 1 million (!) cycles per day. The vocal fold tissue is exposed to the forces resulting from collision of the vocal folds and such an excessive exposure can potentially result in vocal fold pathology.
The distance dose quantifies the distance traveled by the vocal folds on their oscillatory trajectory and is sensitive to both vocal frequency as well as intensity. As shown in Figure 2, the distance dose for a male subject working as an administrator was measured to be close to 2 893 meters, which is almost 6 times larger than the 520 meters safety limit used in industry for hand-transmitted vibrations. Still, this subject did not report any major subjective difficulties resulting from such an amount of voice use (very low effort and discomfort levels). The quality of soft phonation slightly worsened after the prolonged voice use, however, which allows us to speculate about possible slight changes in the surface layer of the vocal folds as a result of excessive vocalization [2]. The quality of the soft voice again improved after rest.
The newly developed vocal dosimeter allows measurement of the different vocal doses as well as studying the effects of vocalization on the quality of soft voice, vocal effort (considered as an indicator of vocal fatigue) and laryngeal discomfort. The advantage of the Pocket PC design, utilized here, is its small size, portability and ease of use (familiar Windows interface). Another important factor is the possibility of modifications to the
6. Literature
[1] Airo E, Olkinuora P, Sala E. A method to measure speaking time and speech sound pressure level. Folia Phoniatr Logop 2000; 52(6):275-288.
[2] Bastian RW, Keidar A, Verdolini-Marston K. Simple vocal tasks for detecting vocal fold swelling. J Voice 1990; 4(2):172-183.
[3] Buekers R, Bierens E, Kingma H, Marres EHMA. Vocal load as measured by the voice accumulator. Folia Phoniatr Logop 1995; 47(5):252-261.
[4] Granqvist S. The self-to-other ratio applied as a phonation detector for voice accumulation. Lecture and poster presented at the 4th Pan European Voice Conference PEVOC IV, Stockholm, August 23-26, 2001 (http://www.speech.kth.se/~svante/aura/).
[5] Jonsdottir V, Rantala L, Laukkanen A-M, Vilkman E. Effects of sound amplification on teachers' speech while teaching. Log Phon Vocol 2001; 26(3):118-123.
[6] Masuda T, Ikeda Y, Manako H, Komiyama S. Analysis of vocal abuse: fluctuations in phonation time and intensity in 4 groups of speakers. Acta Otolaryngol (Stockh) 1993; 113(4, Jul):547-552. [7] Novák A, Dlouhá O, Čapková B, Vohradník M. Voice fatigue after theater performance in actors.
Folia Phoniatr 1991; 43:74-78.
[8] Ohlsson A-C, Brink O, Löfqvist A. A voice accumulator--validation and application. J Speech Hear Res 1989; 32:451-457.
[9] Popolo PS, Švec JG, Rogge-Miller K, Titze IR. Technical considerations in the design of a wearable voice dosimeter. Poster presented at the First Pan-American / Iberian Meeting on Acoustics / 144th Meeting of the Acoustical Society of America, Cancun, Mexico, December 2-6, 2002. (http://192.107.173.4/peter_html/Images/cancun2_112502.pdf).
[10] Rantala L, Vilkman E. Relationship between subjective voice complaints and acoustic parameters in female teachers' voices. J Voice 1999; 13(4):484-495.
[11] Rantala L, Vilkman E, Bloigu R. Voice changes during work: subjective complaints and objective measurements for female primary and secondary schoolteachers. J Voice 2002; 16(3):344-355. [12] Ryu S, Komiyama S, Kannae S, Watanabe H. A newly devised speech accumulator. Journal for
Oto-Rhino-Laryngology and its Related Specialties 1983; 45:108-114.
[13] Södersten M, Granqvist S, Hammarberg B, Szabo A. Vocal behavior and vocal loading factors for preschool teachers at work studied with binaural DAT recordings. J Voice 2002; 16(3): 356-371.
[14] Švec JG, Popolo PS, Titze IR. Experimental procedure and signal processing algorithms for vocal dose measures (in preparation).
[15] Szabo A, Hammarberg B, Hakansson A, Södersten M. A voice accumulator device: evaluation based on studio and fields recordings. Log Phon Vocol 2001; 26(3):102-117.
[16] Titze IR, Švec JG, Popolo PS. Vocal dose measures: Quantifying accumulated vibration exposure in vocal fold tissues. J Speech Lang Hear Res (in press).
[17] Titze IR. Toward occupational safety criteria for vocalization. Log Phon Vocol 1999; 24: 49-54. [18] Vilkman E. Voice problems at work: a challenge for occupational safety and health arrangement.
Folia Phoniatr Logop 2000; 52(1-3):120-125.
The work has been supported by the National Institutes of Health, Project NIDCD 1, RO1 DC04224-01 (Research Towards Occupational Safety In Vocalization). The authors acknowledge the highly valuable work of Karen Rogge-Miller on the coding of the Pocket PC, the initial suggestion of Sten Tenström on using Pocket PC for voice accumulation and the advice of R. Hillman on the selection of the accelerometer.