4.1 In tro d u ctio n
For a th in route multiplexing link optim isation, three factors are considered (a) low bit rate speech coding th a t is already covered in chapter 3, (b) Voice Activity Detection (VAD) and (c) Lost Frame Reconstruction (LFR). The bandw idth of any link, w ithout employing VAD is lim ited to the maximum number of circuits connected to the link. For example, in a 64 k b /s link, a single PCM 64 k b /s source can be accommodated, but through the same link at least 9 users, each of 6.4 k b /s, can be multiplexed excluding the signalling inform ation. The capacity of such a link can be further increased by exploiting the activity of speech signal and statistically multiplexing users on link. Thus, further increase in bandw idth can be achieved by the proportion of silent periods detected. In a statistical multiplexing environm ent the load variations at the multiplexer input change dynamically. Since none of the users are allocated fixed slots, as is the case of conventional Time Division M ultiplexing (TD M ), the sudden demand for bandw idth can exceed the capacity of the link. Sudden surge of users and bandw idth requirement can be tackled by two alternative approaches: either buffering incoming d a ta for any tim e period until the link is free, this may lead to a potential increase in delay which can not be afforded for a delay sensitive, real tim e speech service; or by forcing excessive number of users to withdraw from their small segments of speech (fram es), a similar concept is employed for congestion control in voiced multiplexer traffic sm oothing
CHAPTER 4. LOW BIT RATE SPEECH MULTIPLEXER TOOLS 43
[37] [47]. Such a forced frame discard could cause a varying degree of distortion in speech.
The com pensation of distortion can be performed by activating LFR for the relevant segments of speech.
In section 4.2, source of speech database for measuring the tem poral param eters of speech for representing by two state Markov model is described. The concept of Discontinuous Transmission (D TX ), design perspectives of VAD and comfort noise generation are discussed in section 4.3. Measurements of speech tem poral param eters via visual reference, effects of hangover on these param eters, speech clipping and other measurement are presented in section 4.4. In section 4.5, speech quality assessment measures in time and frequency domain are covered. The design perspectives of two state Markov model, verification and comparisons of sim ulated tem poral param eters w ith measured statistics are discussed in section 4.6. LFR technique, their performance measures and its application in multiplexer link sim ulation is discussed in sections 4.7 and 4.8. And finally the remarks in 4.9 section conclude this chapter.
4.2 S p eech D atab ase
Human communication via speech can be classified into two modes dialogue (1) a norm al telephone conversation and (2) monologue, a commentary type speech. The activity ratio, which is a proportion of active speech to silences is different for both types of speech and has been studied and well documented by many researchers [80] [81] [82] [13]. In conversational speech the activity ratio is on average around 40%, this is heavily dependent on the mood and mode of the conversant engaged in conversation. A bout 60% of the to tal conversational tim e is silence, duration of silence varies from a 10’s of ms to 100’s of ms. In conversational speech am ount of silent periods in monologue speech decreases by the same proportion as the speech
CHAPTER 4. LOW BIT RATE SPEECH MULTIPLEXER TOOLS 44
activity increases. Therefore the required bandw idth for providing such service would also be double th an required for conversational speech.
The speech m aterial used for computing tem poral param eters for simulating the silence and talkspurt statistics is based on the speech m aterial used for speech coding purpose in m ultim e
dia group, university of surrey. The measurements are carried on monologue speech m aterial.
More th an 2500 speech frames of each 20 ms was used. Each frame is th an represented by 128 bits or 16 bytes per frame of speech th a t is 16 * 2500 bytes of speech inform ation was involved for statistical measurements. The measured values of tem poral param eters based on the above database are compared with the measured database proposed by [13], which seems to be in close agreement w ith each other.
It is also observed th a t tem poral param eters are independent of observation window size (frame size) of the coding algorithm, but these are influenced by the hangover used by VAD.
There is a trade off in hangover, speech clipping and activity proportion. A smaller hangover can end up clipping some of the off-set regions of speech talkspurt but requires less bandw idth.
On the other hand, hangover, brings an extra increase in activity by any additional increase consumption (an im portant consideration for hand-held portables). DTX comprises basically of two parts, a VAD at the tran sm itter to detect the speech from silences, and a comfort noise generator at the receiver. Comfort noise is generated by the receiver when D TX has switched the tran sm itter off. The purpose of this is to fill the gaps of silences w ith low level noise normally generated from normal distribution and scale it to elim inate the unpleasant subjective effect of switching between speech in high noise, and silence [59]. The block
CHAPTER 4. LOW BIT RATE SPEECH MULTIPLEXER TOOLS 45
diagram of Fig. 4.1 show the DTX system concept.