New Models for Perceived Voice Quality
Prediction and their Applications in
Playout Buffer Optimization for VoIP
Networks
University of Plymouth
United Kingdom
{L.Sun; E.Ifeachor}@plymouth.ac.uk
Dr. Lingfen Sun
Outline
Background
Speech quality for VoIP networks Current status
Aims of the project
Main Contributions
Novel non-intrusive voice quality prediction models
Novel perceptual-based speech quality optimization (e.g. jitter
buffer optimization) mechanism
Background – Speech Quality for VoIP Networks
MOS
VoIP speech quality: end-user perceived quality (MOS), an
important metric.
Affected by IP network impairments and other impairments. Voice quality measurement: subjective (MOS ) or objective
SCN IP Network
Gateway
SCN: Switched Comm. Networks
(PSTN, ISDN, GSM …) Non-intrusive
measurement
SCN
Gateway
End-to-end Perceived speech quality
Reference speech Intrusive Degraded speech
measurement
Current Status and Problems
Lack of an efficient non-intrusive speech quality
measurement method
E-model (a complicated computational model)
Based on subjective tests to derive models/parameters,
time-consuming and expensive. Only limited models exist
Lack of perceptual optimization control methods
only based on individual network parameters for buffer optimization
and QoS control purposes
Aims of the Project
IP Network Receiver Voice receiver Jitter buffer Decoder De-packetizer Non-intrusive measurementEnd-to-end perceived voice quality (MOS)
Voice source Encoder Sender Packetizer MOS
To develop novel and efficient method/models for non-intrusive
quality prediction,
To apply the models for perceptual-based optimization control (
Novel Non-intrusive Voice Quality Prediction
VoIP Network PESQ
E-model Measured MOSc
delay MOS(PESQ)
Reference speech Degraded speech
Intrusive method
(packet loss, delay, codec …)
Non-intrusive
method New model (regression or ANN models)
Predicted MOSc
Based on intrusive quality measurement (e.g. PESQ) to predict
voice quality non-intrusively which avoids subjective tests.
New Structure to Obtain MOS
c
PESQ
Delay model
MOS Æ R Æ I
e Ie End-to-end delayE-model
MOSc Id Reference speech Degraded speech MOS (PESQ)
PESQ can only predict one-way listening speech quality
(expressed as MOS).
By a new combined PESQ/E-model structure, a conversational
Regression based Models (1)
Ie Codec
Ie model
Nonlinear regression models are derived for I
ebased on
PESQ/PESQ-LQ
Further combine I
with I
to obtain MOS
MOS (PESQ)
Packet loss MOSc
E-model Delay (d) I d model I d (a) PESQ/
PESQ-LQ MOSÆ RÆIe Measured Ie
Reference speech
Degraded speech
Speech
database Encoder Loss model Decoder
Nonlinear regression
model (Iemodel) Predicted Ie
Regression based Models (2)
Ie can be modelled by a logarithm fitting function with the form of
Parameters for different codecs (PESQ)
c
b
a
I
e=
ln(
1
+
ρ
)
+
12.59 20.06 21.14 30.86 16.68 a iLBC G.723.1 G.729 AMR(L) AMR(H) ParametersRegression Models for AMR (12.2Kb/s)
e.g.
for AMR (12.2Kb/s),
96
.
14
)
3011
.
0
1
ln(
68
.
16
+
+
=
ρ
eI
The goodness of fit is: SSE = 2.83 and R2 = 0.998
MOS vs. packet loss and delay
Perceptual-based Buffer Optimization
Motivation:
only based on individual network parameters (e.g. delay or loss) targeting only minimum average delay or minimum late arrival loss,
not maximum MOS.
There is a need to design buffer algorithm to achieve optimum
perceived speech quality.
Contribution
A perceptual-based optimization jitter buffer algorithm
o Use regression based models for buffer optimization
o Use a minimum impairment criterion instead of traditional maximum
MOS score
o A Weibull delay distribution based on trace analysis
Impairment Function
I
m
Define: impairment function I
mparameters
related
codec
are
and
0
if
1
)
(
0
if
0
)
(
)
1
ln(
)
3
.
177
(
)
3
.
177
(
11
.
0
024
.
0
)
,
(
b
a
x
x
H
x
x
H
where
b
a
d
H
d
d
I
I
d
f
I
m d e
≥
=
<
=
+
+
−
−
+
=
+
=
=
ρ
ρ
ρ r d n n n n b nP
X
d
e
) / ) (()
100
(
)
(
)
100
(
ρ
ρ
ρ
µ αρ
ρ
ρ
ρ
=
+
=
+
−
≥
=
+
−
− − Playout delay d buffer loss ρb Weilbull distributionMinimum Impairment Criterion
Define: minimum impairment criterion
Given: network delay dn, network loss ρn and codec type Estimate: an optimized playout delay dopt
Such that: minimize Im can be reached.
d1 d2 d3 d4 Minimum Im
Perceptual-based Optimization Buffer Algorithm
For every packet i received, calculate network delay ni If mode == SPIKE then
if ni ≤ tail*old_d then mode = NORMAL
elseif ni > head*di then
mode = SPIKE; old_d = di
else
-update delay records for the past W packets
endif
At the beginning of a talkspurt If mode == SPIKE then
di = ni else
-obtain (µ, α, γ) for Weilbull distribution for the past W packets -search playout d which meets minimum Im criterion
Performance Analysis and Comparison (1)
0.2 0.2 150 5 4.4 0.7 16 4 14.3 19.5 186 3 0.3 0.8 46 2 1.1 16.2 153 1 Loss (%) Jitter (ms) Delay (ms) Trace
Selected five traces from UoP to CU (USA), DUT
(Germany), BUPT (China), and NC (China).
Performance Analysis and Comparison (2)
“p-optimum” algorithm achieves the optimum voice
quality for all traces.
“adaptive” algorithm achieves sub-optimum quality with
low complexity.
Performance comparison for buffer algorithms
0.5 1 1.5 2 2.5 3 3.5 4 1 2 3 4 5 Traces MO S exp-avg fast-exp min-delay spk-delay adaptive p-optimum
Conclusions and Future Work
Conclusions
The development of a new methodology and regression models to
predict voice quality non-intrusively.
Demonstrated the application of new non-intrusive voice quality
prediction models to perceptual-based optimization of playout buffer algorithms.
Future Work
To consider buffer adaptation during a talkspurt in order to achieve
the best trade-off between delay, loss and end-to-end jitter.
To extend the work to improve the performance of multimedia