Performance Analysis and Comparison - SPEECH QUALITY PREDICTION FOR VOICE OVER INTERNET PROTOCO

ferently in each mode as shown in Algorithm 6. In spike-detection mode, the delay of the first packet of a talkspurt becomes the estimated playout delay for the talkspurt. Otherwise, the perceptually optimized playout delay based on the delay distribution of the lastW packets (in

NORMAL mode) is used. The large theW value, the less responsive the scheme to adapt. The

head and tail parameters are used to set the threshold for spike detection.

8.5 Performance Analysis and Comparison

In order to compare with other jitter buffer algorithms, we also implemented “exp-avg”, “fast-exp”, “min-delay”, “spk-delay” and “adaptive” algorithms (with different threshold). The results are shown in Table 8.4 for the above mentioned four traces. The window sizeW is set to 1000. The head is 4 and the tail is 2, as suggested in [118]. During the experiment, we changed the window sizeW from 100 packets (3 sec) to 10,000 packets (300 sec, as suggested by [118] and [120]), we noticed that the performance (the overall MOS score) does not show a big difference within the range. We choseW of 1000 (30 sec), as it is an appropriate duration for theIm or MOS calculation and has higher computation efficiency than the longer window

length.

From Table 8.4, it can be seen that “P-optimum” algorithm obtained almost the optimum MOS scores among all the five traces. Our previous proposed “adaptive” algorithm achieved sub-optimum results. The remaining buffer algorithms can achieve good results only in some traces, but not for all. It has to be mentioned that P-optimum has the highest complexity, whereas the others including “adaptive” have the similar low complexity.

8.5. Performance Analysis and Comparison

Table 8.4: Performance comparison for different buffer algorithms

Trace Buffer algorithms Lossρ(%) Delayd(ms) MOS

Trace 1 Exp-avg 4.9 298.5 2.01 Fast-exp 1.5 750.8 1.00 Min-delay 9.4 208.8 2.34 Spk-delay 10.4 225.0 2.18 Adaptive 9.0 208.1 2.37 P-optimum 10.5 188.2 2.43 Trace 2 Exp-avg 1.8 27.3 3.28 Fast-exp 0 35.9 3.44 Min-delay 1.7 27.3 3.29 Spk-delay 3.4 24.9 3.15 Adaptive 0 35.9 3.44 P-optimum 0.1 44.5 3.42 Trace 3 Exp-avg 18.2 432.4 1.01 Fast-exp 14.3 1408.6 1.00 Min-delay 22.1 312.7 1.30 Spk-delay 23.8 325.4 1.22 Adaptive 22.1 299.8 1.35 P-optimum 32.0 171.1 1.80 Trace 4 Exp-avg 5.9 24.0 2.97 Fast-exp 4.3 94.4 2.99 Min-delay 5.3 23.0 3.01 Spk-delay 7.6 21.9 2.86 Adaptive 4.3 72.8 3.02 P-optimum 5.1 34.4 3.02

8.6. Summary

8.6 Summary

In this Chapter, the performance of different existing buffer algorithms is analyzed using the proposed voice quality prediction methods (in terms of MOSc score) for the newly col- lected Internet trace data. Results show that end-to-end delay/delay variation, in general, has a major effect on the selection of buffer algorithms/parameters. For large to medium end-to-end delay/delay variation, a buffer algorithm that aims for a minimum delay is preferred, whereas, for small end-to-end delay/delay variation, an algorithm that targets a minimum loss is better. Based on this, a new adaptive buffer algorithm has been proposed. Results show that it can achieve a better perceived quality for all the traces considered. Further, the perceptual optimized jitter buffer algorithm has been investigated. The minimum overall impairment is used as a criterion for buffer optimization. This criterion is more efficient than using traditional maximum Mean Opinion Score (MOS). It is also shown that the delay characteristics of Voice over IP traffic is better characterized by a Weibull distribution than a Pareto or an Exponen- tial distribution. Based on the nonlinear regression models for voice quality prediction, the Weibull delay distribution model and the minimum impairment criterion, a perceptual optimization playout buffer algorithm has been proposed and performance is compared with other jitter buffer algorithms. Preliminary results show that the proposed perceptual optimum buffer algorithm can achieve the optimum perceived voice quality compared with other algorithms under all network conditions considered. The adaptive algorithm can achieve sub-optimum perceived voice quality with low complexity.

As the work is based on the buffer adaptation at the beginning of each talkspurt, it cannot adapt to any delay changes during a talkspurt. Future work can extend the idea to consider buffer adaptation during a talkspurt [124] in order to achieve a best trade-off among delay, loss and end-to-end jitter.

Chapter 9 Perceived Speech Quality Prediction for

QoS Control

9.1 Introduction

In Chapter 8, the application of perceived voice quality prediction for playout buffer optimization has been presented. In this Chapter, the application of perceived voice quality prediction in Quality of Service Control is investigated. Here, the perceived speech quality is used as a control metric to control the send behavior (i.e. the sender bit rate of codecs) instead of using traditional individual network parameters (e.g. packet loss, jitter or delay) for QoS control. Further a combined control scheme which combines the strength of adaptive bit rate control and priority marking control is investigated.

QoS control mechanisms for VoIP should aim to make optimum use of available network/terminal resources and to minimise the effects of network impairments on voice quality. Several approaches exist to realise QoS control, but most seek to control the information flow from the audio/video sources, adaptively, in accordance with significant changes in the network. An important class of QoS control technique involves rate control (i.e. QoS control is achieved by automatically adjusting the send bit rate depending on network congestion conditions). However, current rate control mechanisms [125–127] are based largely only on network impairments such as packet loss rate or delay during congestion. The strategy is to control the

9.1. Introduction

sender behaviour, using the network impairments, from the receiver or the network node but this may not be sufficient to provide optimum QoS, in terms of the voice quality delivered, because the control information is directly linked to user perceived quality.

A second important class of QoS control techniques exploits knowledge of the fact that different parts of speech have different perceptual importance and so do not contribute equally to the overall voice quality [128, 129]. In this approach, voice packets that are perceptually more important are marked, i.e. given priority, and so are less likely to be dropped than packets that are of less perceptual importance, if there is congestion. The priority marking based QoS schemes are open loop and do not make use of changes in the network impairments.

The main objective of this Chapter is to investigate the possibility of combining rate adaptation control technique with priority marking, to exploit the advantages of the two approaches to provide a robust control scheme which delivers optimum QoS in terms of voice quality. In rate control schemes, the cost of adapting the data flow to changes in the network is that some packets may be dropped randomly when congestion occurs and this will increase the packet loss rate. However, in priority marking schemes important packets are dropped less and delayed less. Thus, the combined scheme should provide improved overall user perceived quality. Dif- ferentiated Service (DiffServ) architecture [31] is used to implement the scheme and employs different queuing methods, the most important of which is a variation of random early drop queue (RED queue). RED not only gives different packets different drop probabilities, it also gives the receiver hints about whether congestion has occurred or is about to occur. With a proper feedback mechanism, this information can be used to control the send bit rate.

The main contributions of this Chapter are twofold. First, we propose a new QoS control scheme that combines the strengths of the adaptive rate control technique and speech priority marking QoS technique to provide a superior QoS control performance than hitherto possible. Second, we propose the use of an objective measure of perceived speech quality (i.e. objective MOS score [22]) instead of individual network impairments (e.g. packet loss and/or delay) to control sender behaviour as this provides a direct link to user-perceived speech quality.

In document SPEECH QUALITY PREDICTION FOR VOICE OVER INTERNET PROTOCOL NETWORKS. L. Sun (Page 163-168)