A d d e n d u m
P age 9, Table 2.1: Quality scale applies to MOS, Impairment scale to Degradation Category Rating (DCR) test[106].
P age 19, Section 2.4.1, Line 12: ‘Residual Pulse Excitation’ should read ‘Regular Pulse Excitation’.
P age 20, Line 20: ‘Viterbi coding’ should read ‘Viterbi decoding’. C hapter T hree, M ajor N otation Sum m ary:
Q ( - ) Quantizer Function
9k ADPCM Quantizer Gain
Y k Quantizer Output Level
s k Input Speech Samples
S k \ k - i One-step-ahead Prediction
$k \ k Reconstructed Speech Samples
z - 1 Unit Delay Backward Shift Operator A ( z- 1) Polynomial in z _1 for Predictor Poles
Polynomial in z ~ x for Predictor Zeros
Equations (4 .3 ), (4 .1 0 ), (4.11), (4.16), (6 .1 ), and (7.4): 7 1 and 7 2 are replaced
by 7j" 1 and 7^ respectively.
P age 109, Figures 5.7 and 5.8: ‘Quantization Noise Level’ should read ‘Signal-to- Quantization Noise Ratio’.
P age 217, A pp en dix A: Title should read ‘A Brief Overview of Information Theory and Entropy Coding’.
N e w T e c h n iq u e s in
S ig n a l C o d in g
C ra ig R o b e r t W a tk in s
BSc. BE. (QLD) PEng
December 1994
A thesis submitted for the degree of Doctor of Philosophy
of the Australian National University
Department of Systems Engineering
Research School of Information Sciences and Engineering
Paul,
You leave behind many in grief, both family and friends. However, we can all take
comfort in the certainty that our lives are enriched through the times that each of us
shared with you. Your characteristic vitality and flair has left its mark on all those
you touched. In years your life has been tragically short, but your achievements are
testimony to a full and solid innings. May peace be with you.
D e c la r a t io n
These doctoral studies were conducted with supervision from Dr. Salvatore (Sam) Crisafulli and Dr. Robert (Bob) Bitmead of the Australian National University, and Dr. Donald McLean of the CSIRO Division of Radiophysics.
The work contained in this thesis, except where explicitly stated, is original research performed by the author under the guidance of Sam, Bob, and Don. This work has not been subm itted for a degree at any other university or institution.
Chapter 8 of this thesis concerns research work undertaken while at AT&T Bell Lab oratories, Murray Hill, New Jersey. This work was performed under the supervision of Dr. Juin-Hwey (Raymond) Chen in the Speech Coding Department headed by Dr. Rich Cox. The work forms the basis of AT&T patent applications and an AT&T con tribution to the ITU-T (International Telecommunication Union - Telecommunications Standardization Sector).
A significant proportion of the research performed for this thesis is contained in patent applications, has been published, or has been submitted to conferences and journals, as listed below.
S ta n d a rd C o n trib u tio n s and P a te n t A pplications:
[SI] AT&T, “G.728 Decoder Modifications for Frame Erasure Concealment”, Contri bution to ITU-T SG XV/Q.5, March 1994.
[PI] J.-H. Chen and C. R. Watkins, “Linear Prediction Coefficient Generation During Frame Erasure Or Packet Loss” , patent application filed on 14th March 1994.
[P2] R. R. Bitmead, S. Crisafulli and C. R. Watkins, “ADPCM Signal Encoding/De coding System and Method” , two provisional patents lodged on 15th April 1994.
[P4] J.-H. Chen and C. R. Watkins, “Frame Erasure Or Packet Loss Compensation M ethod”, patent application filed on 14th October 1994.
Jo u rn a l P apers:
[J2] C. R. Watkins, S. Crisafulli and R. R. Bitmead, “Practical Kalman Filtering for Speech Coding Applications” , Submitted to IEEE Transactions on Speech and Audio Processing.
[J3] C. R. Watkins, S. Crisafulli and R. R. Bitmead, “An Entropy Coded ADPCM Speech Coding System for Variable Bit Rate Applications” , Submitted to IEEE
Transactions on Speech and Audio Processing.
[J4] C. R. Watkins, J.-H. Chen, “Improving 16 kb/s G.728 LD-CELP Speech Coder for Frame Erasure and Packet Loss” , in preparation for journal submission.
C o n fe r e n c e P a p e r s:
[Cl] C. R. Watkins, S. Crisafulli and R. R. Bitmead, “Reduced Complexity Kalman Filtering for Signal Coding” , International Workshop on Intelligent Signal Pro
cessing and Communication Systems, Sendai, October 1993.
[C2] C. R. Watkins, S. Crisafulli, R. R. Bitmead and R. J. Orsi, “Variable Bit Rate ADPCM via Arithmetic Coding” , IEEE International Conference on Acoustics, Speech and Signal Processing, Adelaide, April 1994.
[C3] C. R. Watkins and J.-H. Chen, “Improving 16 kb/s G.728 LD-CELP Speech Coder for Frame Erasure Channels” , To appear at IEEE International Conference on Acoustics, Speech and Signal Processing, Detroit, May 1995.
[C4] C. R. Watkins, S. Crisafulli, and R. R. Bitmead, “A Variable Bit Rate Entropy Coded ADPCM System” , In preparation for submission to 1995 IEEE Speech
Coding Workshop, Annapolis, September 1995.
Canberra, December 1994.
A c k n o w le d g e m e n ts
I would sincerely like to th a n k my supervisor, Dr. Bob B itm ead, for his patience and in p u t on b o th technical and non-technical issues. Dr. Sam C risafulli, as an advisor and co-supervisor, deserves special th an k s for being willing to listen, and provide feedback, to all my crazy ideas over th e last th re e years. Dr. Don M cLean has also provided useful in p u t, and I th a n k him for his m ore recent a tte m p ts at com m ercialization of some of th e work co ntain ed in th is thesis. O n th e issue of com m ercialization, I m u st not forget th e efforts of a n u m b er of engineers involved w ith developm ental issues, Dr. Johnson Agbinya, Dr. Rong-Yu C hao, an d Haley Jones. T h a n k s also go to two u n d e rg ra d u a te engineering vacation experience stu d e n ts, R o b ert O rsi, and R aym ond C han, for th eir work, and th e supervision experience th ey provided.
Dr. R aym ond C hen, an d D r. Rich Cox from A T& T Bell L aboratories, M urray Hill, N J, also deserve m any th a n k s. I am grateful for th e o p p o rtu n ity to work at M urray Hill for a 12 week p erio d , an d am even m ore grateful for th e o p p o rtu n ity to work on such an in terestin g , an d relevant p ro ject. T h a n k s m ust also go to Dr. P e ter K roon from Bell Labs, for th e in te restin g discussions we h ad on occasion.
T h e th ree m o n th stay a t Bell Labs was p a rt of a six m o n th overseas trip to visit several m em bers of th e speech coding com m unity in b o th university, and in d u strial research groups. T h e groups an d people visited are far too num erous to m ention here. However, I am g ratefu l to everyone I visited for being willing to host me. T h e funding for these visits cam e from th e In te rn a tio n a l Business U nit of T elstra (O T C A ustralia), to w hom I am m ost certain ly g rateful. I believe th a t th e tim e I sp ent visiting o th er researchers was some of th e m o st productive tim e I have spen t du ring my P h D studies. I was b o th able to o b tain in p u t specific to my speech coding P h D research, and broad engineering knowledge on speech coding, and telecom m unications system s. It is th is broad p erspective t h a t is ex trem ely difficult to o b tain th ro u g h norm al university study, w hilst being of v ita l im p o rta n c e to th e selection and p u rsu it of research areas for m axim al re tu rn . I sincerely hope th a t even in to ugh economic tim es T elstra rem ains convinced of th e long te rm b enefits of these aw ards, b o th to th e com pany and to th e co u n try as a whole.
I acknowledge th e A u stra lia n T elecom m unications and Electronics Research B oard for th e ir p o stg ra d u a te scholarships, and th e A u stralian G overnm ent for th e ir funding of th e C o operative R esearch C en tre (CRC ) program and th e provision of A PR A (A us tra lia n P o stg ra d u a te R esearch Aw ard) scholarships. D uring my tim e as a PhD stu d e n t
I have benefited th ro u g h involvement w ith th e C ooperative Research C entre for Ro b u st an d A daptiv e System s. The CRC program is designed to ensure th a t th e research being perform ed a t universities in A ustralia has every chance of bringing benefits to th e A u stra lia n economy. As engineering research is concerned w ith solutions to p rac tical problem s, th e CRC program provides significant advantages to engineering PhD s tu d e n ts and incentives for academ ics to refocus research tow ards goals of some benefit to th e country. T h e aim of having academ ics m oving from th e sheltered university w orld into th e real world is not w ith o ut its controversy. However, there are significant advantages for b o th th e universities, and in d ustries involved. Also, academ ics can not ju s tly claim to be engineers w ith o ut producing work th a t m ight eventually be useful.
E ngineering involves w ealth creation, ra th e r th a n w ealth shuffling, and su p p o rt and p ro m otio n of engineering should take a high profile. T he CR C program is a s ta r t, b u t as w ith m ost governm ent program s, criticism can easily be raised th a t th e public sector is n o t doing enough, and it is too late and ineffective. T h e private sector also carries p a rt of th e b u rd e n of blam e, as m anagem ent boards, largely com prised of people w ith law and econom ics backgrounds, are slow to accept th e need for in p u t from engineering and science (p artic u la rly in A ustralia). A tre n d tow ards science and engineering people on com pany b o a rd s is em erging slowly, and those com panies th a t correctly take th e in itiativ e will be wTell placed for fu tu re years.
Moving away from th e political issues, there are m any o th er people to whom I am g rateful. I have h a d occasion th ro u g h o u t my PhD studies to m eet and discuss various issues w ith a large nu m ber of academ ics, research engineers, and stu d en ts. From all in teractio n s I was inspired and able to learn. A lthough no one in teraction deserves special m en tion , th e sum to ta l of these contacts can n ot be un d erestim ated .
L ast, b u t n o t least, I would like to th an k my fam ily for th eir su p p o rt, m oral and financial. W ith o u t b o th of these I would never even have considered PhD study. W ith fam ily su p p o rt th e sacrifices necessary for PhD s tu d y can ap p e ar w orthwhile.
A b s t r a c t
Speech C oding, or th e d ig ital rep resen tatio n of speech for com m unications purposes, consists of m any different approaches and applications. W ith in th is thesis some analysis of th e very p o p u lar A D PC M (A daptive Differential Pulse Code M odulation) and C E LP (C ode E x cited L inear P red ictio n ) speech coding system s is u n d ertak en. These are tra d itio n a lly viewed as schem es o p e ra tin g in different regions of th e speech coding s p e c tru m . We a tte m p t to show th e bo undaries betw een th e two different approaches are n o t black an d w hite, an d w ith techniques such as K alm an F ilterin g , and variable bit ra te coding, th e schem es lose a significant p roportion of th eir individual identities.
T h e re are four general issues th a t need to be addressed by any speech coding system :
• O u tp u t speech perform ance. • C oder o u tp u t bit rate. • C o m p u tatio n a l complexity.
• Delay in tro d u ced to overall system .
O th e r issues th a t are very im p o rta n t are: robustness to b it and fram e transm ission errors; ro b u stn e ss to background noise such as car and babble noise; adequ ate coding p erfo rm an ce for o th er signals such as m usic and voice-band d ata; and perform ance for m ultip le talk ers an d conference calls.
W ith in th is thesis, a nu m b er of topics are covered. T he first of these considers a th eo re tic al analysis of erro r recovery an d stab ility trade-offs. We th e n consider a vari able b it ra te A D PC M coder which is able to provide good speech qu ality perform ance for m o d e ra te c o m p u ta tio n a l com plexity. K alm an filtering techniques are observed to provide significant subjectiv e perform ance im provem ent to th e variable ra te AD PCM system in tro d u c ed , an d reduced com plexity approaches to o b tain ing th is benefit are im p o rta n t.
T h e to p ic of ro b u stness to fram e erasure errors in LD -C ELP is also considered w ithin th is thesis. Here it is found th a t m inor m odifications to th e L D -C ELP decoder are cap ab le of providing very good fram e erasure perform ance at up to 3% error rates. Some fu rth e r encoder m odifications are useful for providing good perform ance at very high fram e erasu re rate s such as 10%.
Topics covered in th is thesis range from analytical, and quite theo retical, to those
m ore p ractical, and applications oriented. However, th e overall philosophy of th e thesis is to o b tain a b e tte r u n d e rsta n d in g of speech coding techniques in th e sp ectru m betw een A D PC M and CELP.
Contents
D e c la r a t io n ii
A c k n o w le d g e m e n t s iv
A b s t r a c t v i
1 I n t r o d u c tio n 1
1.1 Thesis Motivation ... 1
1.2 Research P h ilo so p h y ... 2
1.3 Thesis Overview ... 2
1.4 Summary of Original C ontributions... 5
2 S p e e c h C o d in g S y s t e m s 7 2.1 Chapter Motivation ... 7
2.2 Speech Quality M e a su re m e n t... 8
2.3 Toll Quality Speech C o d in g ... 10
2.3.1 PCM C o d in g ... 10
2.3.2 A D P C M ... 11
2.3.3 L D -C E L P ... 13
2.3.4 8 kbps Standardization ... 15
2.3.5 4 kbps Standardization ... 17
2.4 Mobile C om m unications... 18
2.4.1 GSM (R PE -L T P)... 19
2.4.2 US and Japanese VSELP S ta n d a r d s ... 20
2.4.3 PS I-C E L P... 23
2.4.4 Q C E L P ... 24
2.5 Low Rate Communications Quality V o i c e ... 27
2.5.1 FS 1016 4.8 kbps C E L P ... 27
2.5.2 2 to 4 kbps Speech Coding ... 29
2.5.3 Very Low Rate Speech C o d i n g ... 29
2.6 PCS S tan d ard izatio n ... 29
2.7 Chapter S u m m a r y ... 31
2.8 Thesis O v e rv ie w ... 32
3 A D P C M S ta b ility 35 3.1 Chapter Motivation ... 35
3.2 In tro d u ctio n ... 35
3.3 Stability of ADPCM S y s te m s ... 38
3.3.1 Definition of S t a b i l i t y ... 38
3.3.2 A Framework for Stability Analysis... 39
3.4 Stability of ADPCM with Adaptive Q u an tizatio n ... 42
3.4.1 The Passivity Theorem ... 42
3.4.2 Stability without Adaptive Q uantization... 43
3.4.3 Quantizer A d a p ta tio n ... 45
3.4.4 Multiplier Theory ... 46
3.4.5 Stability T h e o r e m ... 47
3.4.6 Proof of Stability T h e o r e m ... 48
3.5 D iscussion... 50
3.6 Stability C onclusions... 51
4 A r it h m e t ic C o d in g a n d A D P C M 53
4.1 Chapter Motivation ... 53
4.2 ADPCM Destabilization E ffects... 54
4.3 Entropy coding and SNR ... 56
4.4 Arithmetic Coding ADPCM - System C o n c e p t... 57
4.5 AC-ADPCM System Im provem ents... 63
4.5.1 Quantization D i t h e r ... 63
4.5.2 Perceptual Weighting ... 66
4.5.3 Pitch P r e d ic tio n ... 71
4.5.4 Adaptive P o stfilte rin g ... 74
4.5.5 Quantizer Step-Size U p d a t e s ... 76
4.5.6 Kalman Filter A pplication... 76
4.6 Design Flexibility and Computation Issu es... 77
4.6.1 Prediction O r d e r ... 78
4.6.2 Predictor Update Frequency... 79
4.6.3 Arithmetic Coding Im plem entation... 82
4.7 Further AC-ADPCM C o n sid eratio n s... 85
4.8 Delayed Decision C oding... 87
4.9 Chapter Conclusion ... 88
5 P r a c tic a l K a lm a n F ilt e r in g in S ig n a l C o d in g 91 5.1 Chapter Motivation ... 91
5.2 Introduction... 91
5.3 The All-pole Signal M o d el... 92
5.4 The Kalman F i l t e r ... 96
5.5 Input Noise Filtering and Speech E n h an cem en t... 99
5.5.1 Measurement Noise Variance Parameter E s tim a tio n ... 101
5.5.2 Effect of Noise Input Level ...103
5.5.3 Waveform Plots 104
5.5.4 Model Order for Kalman F il te r in g ... 105
5.6 Kalman P re d ic tio n ... 107
5.7 Reconstructed Output Sm oothing...112
5.8 Coloured Noise and the Kalman F i l t e r ... 113
5.9 Complexity Reduction using Smoothing P r o p e r tie s ...116
5.10 RDE Update Frequency and Steady State Solutions ...121
5.11 Coder Input Noise F ilte r in g ... 123
5.12 Downsampling, Embedded Coding, and Error B u r s ts ...124
5.13 Chapter Conclusion ... 127
6 A C -A D P C M w ith K alm an F ilte r in g 129 6.1 Chapter Motivation ...129
6.2 Kalman Filter A pplication... 129
6.3 Further Computation T rad e -O ffs... 131
6.4 A ‘Practical’ AC-ADPCM System ... 133
6.5 Perceptual W eighting... 134
6.6 P ostfiltering...139
6.7 Bark Spectral D isto rtio n ... 141
6.8 O utput Bit Rate P r o f ile ...145
6.9 Further W o r k ...147
6.10 Chapter Conclusion ... 148
7 K a lm a n F ilter use in C E L P 151 7.1 Chapter Motivation ... 151
7.2 Quantization Noise Filtering in L D -C E L P ... 151
7.3 Kalman Filtering and LD-CELP Transmission E rro rs...153
7.4 FS 1016 4.8 kbps C E L P ... 155
7.5 Deterministic Codebook FS 1016 156
7.6 Chapter Conclusion ...157
8 Fram e E ra su res in L D -C E L P 159 8.1 Chapter Motivation ... 159
8.2 In tro d u ctio n ... 160
8.3 LD-CELP O v e rv ie w ... 161
8.4 Frame Erasures in Mobile C o m m u n icatio n s... 164
8.5 G.728 and Frame E r r o r s ... 166
8.6 Waveform Substitution using Excitation R edundancy...169
8.6.1 Voicing Classification ... 170
8.6.2 Voiced Speech ... 171
8.6.3 Unvoiced S p e e c h ... 172
8.6.4 Predictor U p d a t e s ... 172
8.7 Kalman Filter A pplication... 174
8.8 Relaxation of Compatibility C o n stra in ts ...176
8.9 Simulation R e s u l t s ...178
8.9.1 SNR Results and D iscussions...179
8.9.2 Decoder Waveform C o m p a riso n s...182
8.9.3 Mean Opinion Score Results ...186
8.9.4 Frame Size E f f e c ts ... 187
8.10 Further W o r k ... 188
8.11 Chapter Conclusion ... 190
9 A C -A D P C M R esy n c h r o n iz a tio n and C om bin ed S o u r c e /C h a n n e l C o d in g 193 9.1 Chapter Motivation ... 193
9.2 In tro d u ctio n ... 194
194 9.3 Separation P r in c ip le ...
9.4 Frame Based Resynchronization...196
9.5 Uniform Redundancy D is tr ib u tio n ...198
9.6 Resynchronization without Additional R edundancy... 199
9.7 Chapter Conclusion ... 200
10 A C - A D P C M A p p lic a tio n s 201 10.1 Chapter O v e rv ie w ... 201
10.2 Speech S to r a g e ... 202
10.3 Digital Answering M achines... 203
10.4 Voice Mail/Packet V o ice... 204
10.5 ATM Voice Com pression... 205
10.6 Security Applications... 205
10.7 Wideband Speech and A u d i o ...206
10.8 IC Music S to ra g e ... 207
10.9 Mobile C om m unications...207
lO.lOChapter S u m m a r y ... 208
11 C o n c lu s io n s a n d F u r th e r R e s e a r c h 211 11.1 Summary of C ontributions... 211
11.2 Further R e se a rc h ... 213
11.3 C onclusions... 214
11.4 C o d a ... 216
A p p e n d ic e s 2 1 7 A I n f o r m a t io n T h e o r y a n d E n tr o p y C o d in g 2 1 7 A.l Fixed Rate C o d es... 218
A.2 Variable Rate Codes - Huffman C o d in g ... 219
A.3 Block Huffman C o d in g ...220
A.4 Practical Entropy Coding - A rithm etic C o d i n g ... 221
A.5 Q uasi-A rithm etic Coding ... 222
B K a lm a n P r e d ic t io n for Z ero M e a s u r e m e n t N o is e 2 2 4 C R e d u c e d O rd er K a lm a n F ilt e r J u s tif ic a t io n 228 D I n fo r m a tio n T h e o r y a n d C h a n n e l C o d in g 233 D .l Inform ation Theory ... 233
D.2 Channel C o d in g ... 234
D.2.1 BCH C o d i n g ... 235
D.2.2 Reed-Solomon C o d i n g ... 236
D.2.3 The V iterbi A lg o rith m ... 236
D.3 Bit Errors and D istrib u tio n s... 237
D.4 Combined Source and Channel C o d in g ...238
D. 5 Spread Spectrum C o m m u n ic a tio n s ... 239
E C o n s ta n t S t e p S iz e U n ifo r m Q u a n tiz e r J u s t if ic a t io n 242 E . l The Uniform D is tr ib u tio n ...242
E.2 The Laplacian D is t r i b u t i o n ... 244
F F r a m e D r o p p in g in t h e L D -C E L P E n c o d e r
for 12 a n d 8 k b p s C o d in g 2 4 7
B ib lio g r a p h y 251
G lo s s a r y 268
L ist o f F ig u r e s
2.1 PCM Companding S ch em atic... 11
2.2 CCITT Recommendation G.721 ADPCM Encoder S c h e m atic ... 12
2.3 CCITT Recommendation G.728 LD-CELP S c h e m a tic ... 14
2.4 NTT 8 kbps CS-CELP Schem atic... 16
2.5 RPE-LTP (GSM) Encoder Block S ch em atic... 19
2.6 VSELP Encoder Block Schem atic... 21
2.7 PSI-CELP Encoder Block S ch em atic... 23
2.8 QCELP Encoder Block Schematic ... 25
2.9 FS1016 Simplified Schematic ... 28
3.1 Basic ADPCM Encoder Block Schematic... 37
3.2 Expanded ADPCM Encoder Schematic ... 38
3.3 Speech Equivalence Class Signal M odel... 39
3.4 Error System Block D ia g r a m ... 41
3.5 Passivity Theorem Block D i a g r a m ... 44
3.6 ADPCM Passivity Error System Block Schem atic... 45
3.7 Condensed Block Diagram of the Passivity Error S y stem ... 47
3.8 Error System Schematic for Alternative Theorem P r o o f ... 48
4.1 Fixed Rate ADPCM Encoder Block S c h e m a tic ... 55
4.2 Fixed Rate ADPCM Decoder Block S c h e m a tic ... 56
4.3 Arithmetic Coding ADPCM Encoder Block D iag ram ... 58
4.4 SNR and Segmental SNR versus Average Bit R a t e ... 60
4.5 Auto-correlation of White ‘Quantization’ N o is e ... 64
4.6 Auto-correlation of Quantization Noise for 16 kbps AC-ADPCM . . . . 65
4.7 Auto-correlation of Quantization Noise for 8 kbps A C -A D PC M ... 65
4.8 Auto-correlation of Dithered Quantization Noise for 8 kbps AC-ADPCM 66 4.9 Partial Schematic of CELP Codebook Search P ro c e d u re ... 68
4.10 Sample 100 ms Waveform Segment for Voiced Speech... 71
4.11 APC or ADPCM Encoder with Pitch P r e d ic tio n ... 72
4.12 APC or ADPCM Decoder with Pitch P r e d ic tio n ... 72
4.13 Effect of Predictor Order on AC-ADPCM E n t r o p y ... 78
4.14 Histograms for Variance Bins 1 to 4 82 4.15 Histograms for Variance Bins 5 to 8 83 4.16 Histograms for Variance Bins 9 to 1 2 ... 83
4.17 Histograms for Variance Bins 13 to 1 6 ... 84
5.1 Filtered Speech SNR versus Ratio of to ... 101
5.2 Filtered Speech SNR versus Log Ratio of a^k to <r^f c ...102
5.3 Filtered Speech SNR versus Input Noise S N R ... 104
5.4 Kalman Smoother Input W aveform ... 105
5.5 Kalman Smoother O utput W aveform ... 106
5.6 Original W av efo rm ... 106
5.7 Prediction Gain for Kalman Predictor (solid line) and Standard Lin ear Predictor (dashed line) versus Quantization Noise Level (Relative to Average Signal Level) ...109
5.8 Prediction Gain for Kalman Predictor (solid line) and Standard Lin ear Predictor (dashed line) versus Quantization Noise Level (Relative to Input Signal L e v e l)...' ... 109
5.9 Prediction Gain for Kalman Predictor (solid line) and Standard Lin ear Predictor (dashed line) versus Quantization Noise Level (Relative to
Prediction Error Level) ...110
5.10 Kalman Filter Smoothed Reconstruction SNR (solid line) versus Quanti zation Noise Level (Standard Linear Prediction Reconstruction - dashed l i n e ) ... 112
5.11 Smoothed Signal SNR (solid line) and Prediction Gain (dashed line) for Reduced Order Kalman Filter A p p ro ach ... 119
5.12 SNR (solid line) and Prediction Gain (dashed line) versus Riccati Equa tion Update P e rio d ... 121
5.13 System Schematic for Input Noise and Prediction Kalman Filtering . . . 123
6.1 One Second Speech Input Segment and AC-ADPCM Output Segmental SNR (for 8 kbps Average R a t e ) ... 145
6.2 One Second Speech Input Segment and Output Bit R a t e ...147
8.1 LD-CELP Decoder Block S c h e m a tic ...162
8.2 RF Channel Digital Transmission Schem atic... 165
8.3 LD-CELP Clear Channel Decoder W a v e fo rm ... 183
8.4 LD-CELP Random Excitation Waveform ... 183
8.5 LD-CELP Low Level Random Excitation W a v e fo rm ...183
8.6 LD-CELP Zero Excitation Decoder W aveform... 184
8.7 LD-CELP Proposed Decoder Modifications W av e fo rm ...184
8.8 LD-CELP Kalman Filter Decoder Waveform ... 184
8.9 LD-CELP Encoder Modifications W aveform ... 185
9.1 Arithmetic Coding ADPCM Encoder Block D ia g ra m ... 195
L ist o f T a b le s
2.1 M ean O pinion Score Five P oint Q u a lity /Im p a irm e n t S c a l e ... 9
2.2 Q C E L P B it A llocations for V ariable R ate C o d in g ... 26
4.1 A C -A D PC M SN R Values for Selected R a t e s ... 61
4.2 SN R values for A C -A D PC M w ith P ercep tu al W e i g h t i n g ... 70
4.3 SN R values for A C -A D PC M w ith P o s t f i l t e r i n g ... 75
4.4 Effect of P re d ic to r O rd er in A C -A D PC M ... 79
4.5 Effect of P re d ic to r U p d ate Frequency in A C -A D PCM ... 80
4.6 Effect of P re d ic to r O rd er an d U p d ate F r e q u e n c y ... 80
4.7 SN R M easures for Selected P re d ic to r O rders and U p d ate Frequencies . . 80
4.8 T a b u lar A pproach versus L aplacian D istrib u tio n A s s u m p t i o n ... 85
5.1 SN R versus K alm an F ilte r O rd er for Various Noise L e v e l s ... 107
5.2 P red ictio n G ain versus F ilte r O rd er for Selected Noise L e v e l s ... 108
5.3 L inear P re d ic to r and S m oothed K alm an F ilte r SN R for various Noise L e v e l s ...113
5.4 R educed O rder A pproach co m pared to Reduced O rder P red icto r A pproach 120 6.1 R educed O rder K alm an F ilte r A pproach for A C - A D P C M ... 130
6.2 A C -A D PC M Perform ance w ith K alm an F i l t e r i n g ... 131
6.3 R iccati E q u atio n U p d a te P erio d in A C-A DPCM ... 132
6.4 A C -A D PC M Perform ance w ith K alm an Filtering and Five Sam ple Ric cati E q u atio n U p d a te P eriod ... 133
6.5 AC-ADPCM Performance for Reduced Order P r e d i c t o r ...134 6.6 AC-ADPCM Performance with Perceptual Weighting and Kalman Fil
tering ... 135 6.7 AC-ADPCM with Truncated Impulse Response Perceptual Weighted
Kalman F ilte r in g ...137 6.8 Objective Measure for Postfiltering and Kalman Filtering in AC-ADPCM 139 6.9 Objective Measures for the High Complexity AC-ADPCM Version . . . 140 6.10 Bark Spectral Distortion (BSD) Measures for (High Complexity)
AC-ADPCM ... 142 6.11 Bark Spectral Distortion (BSD) Measures for (Low Complexity)
AC-ADPCM ... 143 6.12 Objective Measure Comparisons for Equivalent Subjective Quality Systems 143
8.1 SNR and Segmental SNR Measures for Minima] Change Options . . . . 179 8.2 SNR and Segmental SNR Measures for Major Options Considered . . . 181 8.3 AT&T MOS Results from October and November 1993 T e s t s ...186 8.4 Frame Size Effects at 3% Frame Erasure R a t e ... 187
F .l SNR Values for LD-CELP V arian ts... 249
C h a p te r 1
I n tr o d u c tio n
1 .1
T h e s is M o t iv a t io n
This PhD thesis addresses a number of useful techniques in signal coding, focusing on speech coding applications. The topics addressed range from the highly practical (and immediately applicable) to the more theoretical (and general) in nature. The thesis also contains a significant element of ‘blue-sky’ philosophy to complement the basic research results. Except for perhaps with the LD-CELP frame erasure work in Chapter 8, we do not claim to present fully functional speech coding systems within this thesis. R ather, g e n e r a l te c h n iq u e s a re in v e s tig a te d a n d s im u la tio n s p e r f o r m e d w h e re b e n e fic ia l
for ‘proof of concept’.
Signal coding is th at part of a communications system that represents the incoming signal in a form (often digital) for efficient transmission over a channel (or for storage). The efficiency requirement usually implies high decoded signal quality for a small num ber of transm itted bits. Other common considerations are implementation complexity (unit expense), and coding delay. Significant engineering trade-offs exist in the design of such systems, with efficient practical solutions being heavily applications dependent. Towards the end of the thesis, we also begin to consider the inclusion of recovery from transmission errors into the ‘efficiency’ requirement.
In order to meet increased user transmission demand on communications networks, the options of both more bandwidth usage (increased network infrastructure), and better signal coding (improved terminal equipment) must be considered. For situations where bandwidth is limited (such as in radio communications), or very expensive to install (such as for some underground cable situations), the use of improved signal coding is the economically superior option.
2 I n t r o d u c t i o n
T he rem ains of th is brief in tro d u c to ry ch apter includes a sh o rt discussion of research philosophy, a thesis overview, and a sum m ary of original research contributions included in th e thesis. C h a p te r 2 augm ents th is ch ap ter by providing a background overview of speech coding system s, including coverage of a num ber of th e significant speech coding sta n d a rd s.
1.2
R e s e a r c h P h ilo s o p h y
T he p u rsu it of a P h D is an o p p o rtu n ity to learn how to perform research. As research always involves lim ited resources to solve problem s (engineering research in p articu lar), an extrem ely im p o rta n t p a rt of learning how to perform research is obtaining as large a background knowledge in th e area as possible. T his background knowledge assists in problem selection, and some elem ents of a background discussion n a tu re are included in th is thesis for com pleteness and m otivational reasons.
As a firm believer t h a t a P hD th esis is m ore th a n ju st a collection of technical papers, th ere is also a c e rtain elem ent of a philosophical n a tu re w ithin this thesis. Largely this is evidenced by th e fu rth e r research discussion sections in some of th e chapters considering a num ber of q u ite ‘blue-sky’ possibilities. C h a p te r 10, on applications of the variable b it ra te A D PC M system in tro d u ced in C h a p te r 4, would also fall into this category of th e m ore philosophical in n a tu re .
Having said th is, however, th e first several chapters in this thesis are on fairly con crete signal coding sub-problem s. T h is work has been published, or at least su b m itted to technical jo u rn a ls and conferences. A num ber of later chapters discuss research re sults th a t do n o t con tain enough d e p th for separate publication. C h a p te r 7 includes inconclusive resu lts, and C h a p te rs 9 and 10 contain discussions in preference to con crete research results. However, it is believed th a t these ch apters form a significant and vital p a rt of th is thesis, by assisting in tying all the chapters together, and rounding out (and m otivatin g) th e work.
1.3
T h e s is O v e r v ie w
C h a p te r T w o :
1 .3 T h e s i s O v e r v i e w 3
overview issues b u t avoiding m any of th e details.
For those readers quite fam iliar w ith speech coding, this review section will not presen t an y thin g new, and can safely be ignored. However, it does a tte m p t to present a unified view across m any different system s, and for some readers may assist w ith m o tiv atio nal insights to sections of th e work perform ed th ro u gh ou t this thesis. Note th a t w here explicit u n d erstan d in g of any p a rts of the system s is required, they are explained in d etail in th e b o d y of th e thesis.
C h a p te r T h r e e :
C h a p te r th re e consists of a theo retical analysis of A D PCM stability. T his analysis takes account of th e effect of q uan tizer a d a p ta tio n on the stab ility of th e A D PCM system . A n u m ber of assum ptions are required for th e m athem atical analysis, b u t th e theory is able to shed some light on th e design of adaptive quantizers in A D PCM system s.
C h a p te r Four:
C h a p te r four presents a variable b it ra te A D PCM system th a t overcomes th e stab ility problem s associated w ith fixed ra te A D PCM system s, as well as exploiting th e b u rsty inform ation n a tu re of speech to save b its thro ug h entropy coding. T he variable rate A D PC M system form s a large p a rt of this thesis, and concerns relating to it are found in a n u m b er of chapters.
C h a p te r F iv e:
T h e fifth c h a p te r is concerned w ith th e use of th e K alm an filter in speech coding ap pli cations. T h e K alm an filter is a tool th a t is useful for a num ber of tasks in speech coding. T he m ajo r uses are investigated, and some techniques proposed to control the com pu ta tio n a l requirem ent of K alm an filtering, w hilst still gaining significant perform ance benefit.
C h a p te r Six:
C h a p te r six presents th e results of using K alm an filtering w ithin th e proposed variable b it ra te A D PC M system from C h a p te r 4. A ‘p rac tic a l’ variable rate AD PCM system is presen ted a t th e end of this ch ap ter. Jud g ed by inform al listening tests, the o u tp u t speech q u ality of th is variable ra te system a t an average rate of 12 kbps is equivalent to th a t of 16 kbps LD -C ELP. T h e co m p u tatio n al com plexity for this 12 kbps variable ra te A D PC M system is significantly less th a n th a t of LD-CELP.
C h a p te r S ev en :
4 I n t r o d u c t i o n
K alm an filter is investigated in FS 1016 4.8 kbps C E L P and in C C IT T R ecom m endation G.728 16 kbps LD -CELP.
C h a p te r E ig h t:
C h a p te r eight discusses work perform ed while at A T& T on the fram e erasure problem for wireless com m unications w ith LD-CELP. B oth a b it-stream com patible version w ith th e existing LD -C E L P sta n d a rd , where only decoder changes are m ade, and some m inor m odifications to th e encoder are investigated. W ith decoder changes only excellent perform ance is o b tained for a 3% fram e erasure rate , and w ith encoder m odifications, good perform ance is observed a t a fram e erasure rate as high as 10%.
C h a p te r N in e :
C h a p te r nine investigates th e resynchronization problem th a t ap pears w ith th e use of A rith m etic C oding in the variable b it rate A D PC M system intro d uced in C h a p te r 4. For use in p ractical system s, resynchronization after bit transm ission errors is extrem ely im p o rta n t, and this is a significant hinderance to th e application of the variable bit rate A D PC M approach to a wider variety of uses. R equirem ents for fram e based resynchronization are discussed, b u t due to th e large applications depen d ent n atu re of th e approaches, no sim ulation results are presented.
C h a p te r T en:
T he te n th ch ap ter briefly discusses a num ber of practical considerations w ith respect to applications of th e variable b it rate A D PC M system introduced in C h a p te r 4. Some of these are im m ed iate applications, w hilst some require the solution of a num ber of associated research and developm ent problem s, and others are very much ‘blue sk y’ typ e applications.
C o n c lu sio n :
T h e thesis finishes w ith a conclusion ch ap ter in which th e m ajo r results are sum m arized, as well as an indication given of possible areas for fu tu re research work.
G lo ssa r y :
1.4 S u m m a r y o f O r ig in a l C o n tr ib u tio n s 5
1 .4
S u m m a r y o f O r ig in a l C o n t r ib u tio n s
D uring th e course of research for th is PhD thesis, a num ber of original co ntributions have been m ade. These are th e su b ject of p a te n t applications, jo u rn al, and conference p ap ers, as previously indicated. A b rief description of th e original work is listed below.
• A D P C M S t a b ility A n a ly s is: A theo retical analysis of AD PCM stability is perform ed th a t takes account of quantizer a d a p ta tio n . T h e result shows th a t th e ra te of quan tizer step size decrease is closely linked to stability, and th is can be viewed as a th eo retical ju stificatio n for th e shape of th e m ultiplier curve for th e Ja y a n t ‘O ne-W ord M em ory’ ad ap tive quantizer.
• A D P C M a n d A r it h m e t ic C o d in g : A novel variable ra te AD PCM speech coding system is proposed th a t overcomes some A D PC M stab ility problem s, and m axim ises perform ance m easures such as SNR. Im m ediate applications are in speech coding for storage purposes, w here there are some m ajor advantages of th e approach over alternatives.
• P r a c tic a l K a lm a n F ilte r in g : A num ber of ways th a t K alm an filtering tech niques can be applied to advantage in speech coding applications are shown, while paying careful a tte n tio n to th e issue of com p u tatio n al com plexity. T he applica tion of K alm an filtering techniques to th e above variable rate AD PCM system an d o ther C E L P system s is also investigated.
C h a p t e r 2
Speech Coding Systems
2 .1
C h a p te r M o t iv a t io n
T h is c h a p te r intends to provide a general background to speech coding, prim arily for tho se readers som ew hat unfam iliar w ith the su b ject area. M any of th e well known and em erging s ta n d a rd s are covered briefly, w ith an a tte m p t m ade to highlight th e u p per level system s concepts an d th e key sim ilarities and differences of the various approaches.
Speech coding is th e source coding com ponent of a voice com m unication system such as th e w ire-based telephone netw ork or mobile com m unication netw orks. O ther im p o r ta n t sy stem com ponents include channel coding, m odulation, netw ork access protocols, and fram e synchronization. M any o th er system com ponents exist which effect speech coder design to various extents, and a good general telecom m unications engineering b ackground is extrem ely valuable for speech coding research. Some elem ents of th is teleco m m un ications background are discussed b o th w ithin this ch ap ter and th ro u g h o u t th e thesis as a whole.
Speech coding is a lossy form of source coding, and has four generic perform ance issues: tran sm ission b it rate; co m p utational com plexity; o u tp u t speech quality; and coding delay. O th e r issues are also im p o rta n t, such as robustness to transm ission errors, m u ltistag e encoding/decoding, and accom m odation of non-voice signals such as in -ban d signalling and voiceband m odem d ata.
T h e req uirem en t for robustness to transm ission errors is significant. T heoretically com plete se p ara tio n of source and channel coding is possible, such th a t the source decoder receives as in p u t th e o u tp u t of the source encoder w ith o ut error. Practically, however, th is is n o t possible. To o b tain this ‘p e rfe c t’ channel coding is too expensive in term s of delay, co m p u tatio n , and transm ission overhead to be useful for m ost practical
8 S p e e c h C o d i n g S y s t e m s
speech coding ap plications. Hence th e speech decoder will have to accom m odate some tran sm issio n errors.
H aving m ade th is p o in t, it is im p o rta n t to note th a t th e above four issues are th e prim e concern from th e speech source coding perspective. Different speech coding applicatio ns d ic ta te various com binations of ranges of these four param eters, and this also provides a convenient way in which to separate speech coding system s into different groups.
A lthough th ere are m any o th er possible categories for speech coding, th e following break -u p has been chosen for th e purposes of this chapter: (1) Toll Q uality Speech C oding - dealing w ith telephone q u ality services, principally over wire based networks; (2 ) Mobile C om m unications - dig ital m obile telephone system s; (3) Low R ate C om m u nication s Q uality Voice - p rim arily for m ilitary type applications and satellite system s; and (4 ) PC S (Personal C om m unications System s) S tan d ard izatio n - em erging appli cations for th e future.
T h e next section in th is c h a p te r discusses th e issue of speech quality m easurem ent, and following th is th e above four speech coding categories are considered in tu rn . To w ards th e end of th e c h a p te r a brief sum m ary section is provided w ith some additional general speech coding references given. Finally th e rem aining ch ap ters in th e thesis are o u tlin ed , re ite ra tin g and au g m en tin g th e thesis overview in C h a p te r 1.
2 .2
S p e e c h Q u a lity M e a s u r e m e n t
T ransm ission b it rate , coder co m p u ta tio n a l complexity, and coding delay are all rel atively easy to quantify. However, o u tp u t speech quality is m ore difficult to m easure and deserves special m ention.
A lm ost invariably speech signals tra n s m itte d over telecom m unications netw orks are d estin ed for final processing by th e hu m an auditory system . (T he m ost notable excep tion being th a t of co m p u ter speech recognition algorithm s.) Hence it is clear th a t th e final te s t of speech o u tp u t q u ality is how it sounds to th e person listening. U n fortu n ately subjective m easures like th is are not of m uch direct benefit to th e speech coder. For a subjective m easu rem en t to be of any value an obvious requirem ent is th a t the subjective evaluation process is form alized in some way.
2 .2 S p e e c h Q u a lity M e a s u r e m e n t 9
te stin g involves a group of te st su b jects rating samples of coded speech (encoded and decoded) on a discrete five point scale. T he quality and im pairm ent levels of th e five p o in t scale are shown in Table 2.1, and test subjects are generally recru ited ‘ran d o m ly ’, and given very little in stru ctio n on th eir tasks.
MOS Score Q uality Scale Im pairm ent Scale 1 U nsatisfactory O bjectionable (very Annoying)
2 Poor Annoying b u t not O bjectionable
3 Fair Slightly Annoying
4 Good Perceptible b u t not Annoying
5 Excellent Im perceptible
T able 2.1: M ean O pinion Score Five Point Q u a lity /Im p airm e n t Scale
A n o th er form al su bjective te st strateg y is the Diagnostic Rhym e Test (DRT)[92j. T h is is used prim arily for low rate (and lower quality) speech coding system s where intelligibility is th e p rim ary concern. Listeners are required to pick which one of a rhym ing p a ir of words was played (b o th words presented visually). In general only the in itial consonants of th e words are changed, such th a t for plosives exam ples m ight be: BAM , DAM , PAM, TAM , and KAM[92].
For speech coding, objective m easures of quality are required, and sta n d a rd m ea su rem en ts of SN R (Signal to Noise R atio) and segm ental SNR are often used. However, th ese m u st be used w ith cau tio n , as they do not always give a good indication of th e su b jectiv e quality. In fact, it is not difficult to obtain a sam ple of speech which when coded by two different approaches gives a large increase in SNR for one of them , b u t th is is subjectively th e inferior of th e two. W hen dealing w ith sim ilar approaches, such as for tu n in g a coder p a ra m ete r, SN R m easures are usually a reliable indication of relativ e perform ance.
Due to th e fact th a t levels of noise are not perceived equally across the speech sp e c tru m , a m ore useful objective m easure is the perceptually weighted SNR. O th er im p o rta n t considerations w ith such m easures is to lim it th e effect of silence periods betw een speech u tte ra n c e s on the m easure. A threshold on th e speech activity level is often used in conjunction w ith an SN R type of m easure.
10 S p e e c h C o d in g S y s t e m s
w ithin th e speech coders. V ariants of SN R and MSE (M ean Square E rror) m easures are com m only used[175] eith er explicitly or im plicitly w ithin speech coding system s.
2 .3
T o ll Q u a l i t y S p e e c h C o d i n g
Toll quality is generally accepted to be th e quality equivalent to an ideal analog wired line connection[106]. T h is is associated w ith a M ean O pinion Score (MOS) of around 4, in d icating good q u ality o u tp u t w ith (just) perceptible b u t n ot annoying im pairm ent. T h e coders in th is section are all ra te d as toll-quality speech coders, b u t a t decreasing b it rates, and correspondingly increasing co m p u tatio n al com plexities.
Toll quality coders are p rim arily designed for use w ith close in teg ratio n to existing telephony netw orks. As such, delays in tro d u ced in the coding process are a problem due to h ybrid echo. In ord er to elim in ate th e need for com plicated (and expensive) echo cancellers, th e coding delay m u st be tig h tly controlled. For PC M and A D PCM coding discussed below, th e re is no significant delay to give concern. However, in order to m ain tain p erform ance a t lower b it rates, it is generally necessary to increase the coding delay.
R e m a r k 2.1 Higher q u ality speech th a n th a t afforded by toll quality system s is also of significant in te rest w ith in telecom m unications netw orks. T h e lim itatio n in b a n d w id th in tro du ced by th e analog telephony system produces su b sta n tia l degradation in speech quality, and 16 kbps (kilo b its p er second) coders w ith a 7 kHz speech b a n d w id th have recently been proposed for some applications[81].
2 .3 .1 P C M C o d i n g
Based on telephony qu ality speech having a b an d w id th betw een 300 Hz and 3.4 kHz, an 8 kHz sam pling ra te is sta n d a rd . T h e PC M (Pulse Code M odulation) sta n d a rd for digital speech rep re sen ta tio n , C C IT T 1 R ecom m endation G.711, uses a non-uniform qu an tizer ch aracteristic to account for th e large dynam ic range of speech. Each sam ple is qu antized using 8 b its, resu ltin g in a b it ra te of 64 kbps[106].
T he non-uniform q u an tizer ch aracteristic can be considered to be o b tain ed th ro ug h th e use of a non-lin earity in cascade w ith a uniform q uantizer (com panding), as shown
2 .3 T o ll Q u a lity S p e e c h C o d in g 11
in F igu re 2.1. However, in practice it is often obtained by high rate qu an tizatio n followed by a dig ital piece-wise linear approxim ation of th e non-linearity. C om panding using b o th th e A-law and /i-law characteristics is discussed in m any te x ts, including th e one by Carlson[27].
Discrete
Valued
Output Continuous
Valued
Input
Continuous
Valued
Output Discrete
Valued
Input
Uniform
Quantizer
Inverse
Quantizer
Figure 2.1: PC M C om panding Schem atic
P C M speech coding is m em oryless, and is n a tu ra lly ro b u st to random b it tran sm is sion erro rs in th e sense t h a t a bit erro r only affects one decoded speech sam ple.
It is on th is ra te of 64 kbps th a t ISDN (In teg rated Services D igital Network) tele phone lines are based. T his sta n d a rd has received its share of criticism as being too high a basic ra te for speech and too low for o th er applications, b u t is continuing to be im p lem ented. W ith fiber optic backbone netw orks, sta n d a rd s and sim plicity of basic s tru c tu re are im p o rta n t to enable high speed switching. It appears th e efficiency of the s ta n d a rd for various ap plications is of only secondary im portance.
2.3.2
A D PC M
[image:31.526.65.482.166.393.2]12 S p e e c h C o d in g S y s te m s
q u an tizatio n .
T here is, however, an A D PC M (A daptive Differential Pulse Code M odulation) s ta n d ard , C C IT T R ecom m endation G.721, based on an adaptive predictor to remove re dundancy, and an ad ap tiv e q u a n tiz er to account for th e dynam ic range of the prediction residual. T he G.721 s ta n d a rd uses a q u an tizer w ith 4 bits, th u s resulting in a coded speech d a ta stre a m a t 32 kbps.
Encoded Input
Speech
KIK-1
(Decoder y
Output) ■+■
(Decoder)
Figure 2.2: C C IT T R ecom m endation G.721 A D PCM Encoder Schem atic
Figure 2.2 displays a block diag ram of th e C C IT T R ecom m endation G.721 32 kbps A D PC M encoder. T he q u an tizer o u tp u t, Y*., shown in th e diagram is assigned a four bit binary codew ord for channel tran sm ission to th e decoder. At th e decoder th e Y*. sequence is re c o n stru c te d from th e b in ary tra n s m itte d d a ta , and th e decoded speech o u tp u t, is o b tain ed in an identical fashion to in th e encoder shown in th e figure. Hence a copy of th e decoder is inh eren tly contained in th e encoder, and th e decoder need not be shown explicitly.
[image:32.526.45.461.211.451.2]2 .3 T o ll Q u a lity S p e e c h C o d in g 13
T h e p redictio n , S W _ i, is form ed via the two F IR (F inite Im pulse Response) filter s tru c tu re s , A ( z ~ 1) and connected to yield an H R (Infinite Im pulse Response) filter. For G.721 A D PC M A( z ~ l ) is of degree two, representing two poles, and B ( z ~ l )
is of degree six, representing six zeros. B oth th e quantizer and predictor are adaptive to account for th e n o n -statio n ary sta tistic s of th e in p u t speech. T he adaptive q uantizer is rep resen ted in Figure 2.2 by a fixed quantizer, Q (.), and an adaptive scaling factor,
gk, w hich effectively d icta te s th e qu an tizer step size.
C C IT T R ecom m endation G.721 A D PC M is ro b u st to random bit errors at a rate of 10- 3 , b u t produces significantly degraded (and percep tually annoying) speech at an erro r ra te of 10- 2 . A m odified form of th e Ja y an t ‘One-W ord M em ory’ adaptive q u a n tiz er is used in G.721 ADPCM [106]. From th e transm ission error perspective, the ‘O ne-W ord M em ory’ is actu ally an infinite m em ory, and a decay factor is introduced to help ‘fo rg e t’ previous errors.
T h e C C IT T A D PC M s ta n d a rd is well developed, having been stan dard ized in 1985, an d a good general description of A D PC M can be found in th e tex t by Ja y an t and Noll[106]. An em bedded A D PC M approach and sep arate coders o perating at rates from 16 kbps to 40 kbps are included w ithin C C IT T stan d ard s. Im plem entation of th e G.721 A D PC M sta n d a rd is now relatively inexpensive due to the existence of ASIC (A p plication Specific In te g rate d C ircu it) im plem entations and D SP (Digital Signal P ro cessor) applicatio ns notes.
2.3.3 LD-CELP
In 1988 th e req uirem en ts and objectives for a 16 kbps speech coding sta n d a rd were approved by th e C C IT T . T h e requirem ents were for a system w ith effectively the sam e perfo rm ance in all areas as 32 kbps A D PC M , and w ith a very tig h t bound on th e coding delay. A t th e tim e m any researchers viewed th e requirem ents as practically im possible to m eet.
A n u m b er of groups persevered w ith research tow ards th e standard[34, 48, 186], an d in May of 1992, th e A T & T 2 floating po int version of LD -C ELP (Low Delay Code E x cited L inear P red ictio n ) was officially ad op ted by th e C C IT T as th e 16 kbps sta n d a rd G.728[33]. T h e coder perform ance is significantly b e tte r th a n th a t of G.721 A D PCM in some areas. However, th is comes a t th e cost of an extrem ely high co m putatio n al req uirem en t.
14 S p e e c h C o d in g S y s te m s
Figure 2.3: C C IT T R ecom m endation G.728 LD -C ELP Schem atic
T h e C C IT T R ecom m endation G.728 LD -C ELP schem atic is shown in Figure 2.3. T h e d iag ram m inus th e p o stfilter section represents th e LD -C ELP encoder. As w ith A D PC M , th e decoder is inh erently contained in th e encoder, b u t for LD -C ELP there is th e ad d itio n al p o stfilter included a t th e decoder.
T h e vector length in L D -C E L P is five sam ples, corresponding to a codebook size of 1024 entries a t th e 16 kbps code rate . T he tra in ed codebook is in gain-shape form at, w ith 128 shape vectors, an d 8 gain levels (one gain sign b it, and two gain m agnitude b its). A lthough a nu m b er of techniques are used to reduce th e am oun t of com putation required for th e codebook search, th e procedure is an exhaustive search, sim ilar to passing each codebook e n try th ro u g h th e synthesis filter to o b tain a decoded o u tp u t vector. A p ercep tu ally w eighted M SE (m ean square error) criterio n is used to select th e codebook in d ex th a t gives th e least o u tp u t d isto rtio n , and th is index is tra n sm itte d to th e decoder.
[image:34.526.52.463.88.350.2]2 .3 T o ll Q u a lity S p e e c h C o d in g 15
c u rre n t filter param eters, th e u p d a te procedure is op erated once every 20 sam ples, re su ltin g in a large co m p u tatio n cost. Also to m aintain perform ance in th e absence of a p itc h p red icto r, th e linear p red icto r is of a high order (50th order), m ainly for th e b en efit of fem ale speech (w ith sho rt pitch periods where some elem ent of pitch redundancy can be exploited by th e linear predictor).
R ecent work on LD -C E L P has been targ e te d a t fixed p oin t im plem entation[35], a n d th e fixed p oin t co n trib u tio n to th e C C IT T G.728 sta n d a rd was m ade by A T& T in S ep tem b er 1993. T his should assist in increasing th e applications base for LD -CELP, b u t th e co m p u ta tio n a l requirem ents are still su b stan tial. A la te r ch ap ter (C h ap ter 8) of th is thesis discusses some work on th e problem of fram e erasure th a t m ay assist in m aking LD -C E L P a suitab le can d id ate for fu tu re PC S (Personal C om m unications System s) or F P L M T S (F u tu re P ublic Land Mobile T elecom m unications System s).
2 .3 .4 8 k b p s S ta n d a r d iz a tio n
T h e term s of reference for p ro d uction of a. toll quality 8 kbps speech coder were for m u la te d by S tu d y G roup 15 of th e C C IT T in 1990, and revised (relaxed) in Novem b e r 1991. Two coders were proposed as cand idates in Novem ber 1992. One coder is th e C S -C E L P (C onjugate S tru c tu re C E L P) system from N T T (N ippon Telegraph and T elep h o n e)[110, 111, 112, 113], an d th e other is an ACELP (A lgebraic C E L P) sys tem from France T elecom /U niversity of Sherbrooke[155, 156]. B oth can did ate coders m et all th e C C IT T requirem ents d uring th e qualification phase of th e stan d ard izatio n process.
It is generally recognised th a t for low delay coding some form of backw ards a d a p ta tio n is needed[164, 191], and tig h t delay requirem ents m ake achieving th e required ra te difficult. However, th e 8 kbps sta n d a rd delay requirem ents, originally for an encoding fram e len g th of less th a n 5 m s, were relaxed by th e C C IT T in Novem ber 1991 to less th a n 16 m s. T he o th er requirem ents are principally for high quality in error-free con d itio n s, as well as robustness against channel errors, including fram e loss and random b it errors.
16 S p e e c h C o d in g S y s t e m s
Codebook
Search
Module
Perceptual Weighted
MSE
Decoded Speech
Vector Gain
Codebook Adaptive
Codebook
VQ Gain
Codebook
Synthesis Filter
Conjugate Structure Codebook
Gain
Codebook
Quantizer
LPC
Analysis
(Decoder)
Input Speech
Vector
Soft Limiting
Figure 2.4: N T T 8 kbps C S -C E L P Schem atic
a d a p ta tio n of th e VQ (Vector Q u an tizatio n ) gain.
T h e C S -C E L P fram e len g th of 10 ms consists of two subfram es of 40 sam ples each (sta n d a rd 8 kHz sam pling frequency). In p u t speech is soft-lim ited if above threshold level. LSP p a ra m ete rs are tra n s m itte d each fram e, and all o th er p a ra m ete rs are tra n s m itte d each sub-fram e. T h e ad ap tiv e and fixed-shape codebook are exam ined sequen tially, and th e ex citatio n vector is selected th a t m inim ises th e percep tu ally weighted error criterion, found th ro u g h th e use of an ARM A (A uto Regressive Moving Average) p erc e p tu a l w eighting filter. T he decoded speech is post-filtered a t th e decoder (not shown) to enhance th e p e rc e p tu a l quality.
[image:36.526.46.461.87.435.2]2 .3 T o ll Q u a lity S p e e c h C o d in g 17
L P C (L inear P red ictive C oder) coefficient analysis is perform ed on th e soft lim ited in p u t speech. T he 10th order line sp ectral pair prediction residual, after first order MA (M oving Average) prediction, is th e n q uantized via a dual stage VQ, w ith a split s tru c tu re a t th e second stage.
In order to reduce th e codebook search com plexity for th e C S-C E L P system , a preselection p ro cedu re is used. T he codebook search d isto rtio n calculation basically involves two se p ara te com ponents, b u t on th e basis of only one of these com ponents it is possible to ignore a n u m b er of th e codevectors for th e nex t com ponent calculation. It is th eo retically possible th a t this procedure will result in th e ‘b e s t’ codevector being ignored. However, th is is very unlikely, and th e com p u tatio n al savings are extrem ely im p o rta n t.
T h e conjug ate s tru c tu re fixed shape codebook m eans th a t an o u tp u t vector is form ed by sum m ing a t least two vectors, each stored in a different codebook. T he ad v an tag e of th is approach is im proved robustness, reduced m em ory requirem ents, and red u ced search com plexity in com bination w ith preselection.
T h e A C E L P can d id a te system from France Telecom /U niversity of Sherbrooke is very sim ilar in overall s tru c tu re to th a t of th e C S-C ELP system shown in Figure 2.4. O f course, th e details of th e two system s differ, and th e in terested reader should refer to th e p a p e rs referenced above. T he sparse s tru c tu re of th e AC ELP codebook allows for efficient searches. In a m an ner analogous to th e preselection used in CS-CELP, th e search is perform ed in a nested fashion, w ith th e requirem ent th a t a perform ance th re sh o ld is exceeded on th e first stage of th e search before proceeding to th e next stage. T h e m axim um am o u nt of th e codebook search is also lim ited to 4%, b u t th is resu lts in negligible perform ance d egrad ation from th a t of a full search.
G enerally b o th c a n d id a te coders ob tain th e required perform ance, and although th e re is always a desire for lower com plexity and lower delays, th e 8 kbps sta n d a rd a p p ears unlikely to experience a radical d e p a rtu re from th e two system s m entioned above before th e final ratificatio n phase. A significant am ount of a tte n tio n is now focussed on th e 4 kbps sta n d a rd iz a tio n effort.
2 .3 .5 4 k b p s S ta n d a r d iz a tio n
18 S p e e c h C o d in g S y s t e m s
It is recognised th a t th ere are two m ajor directions th a t speech coders are likely to take[53]. T h e first of these is where power consum ption requirem ents are m ore im p o rta n t th a n sp e c tra l efficiency. T his leads to th e consideration of low com plexity system s in th e 8 to 16 kbps range.
T h e o th e r appro ach is w here sp e ctral efficiency is of th e u tm o st im portance. T his is m ainly due to term in a l proliferatio n , w ith applications drive from areas such as visual telephony, person al com m unications, and satellite based personal system s. T his leads to th e q u est for system s in th e 2 to 4 kbps range.
O f significance w ith th e p relim in ary specifications is th a t there is no requirem ent for th e sy stem to pass voiceband d a ta , which has been a general requirem ent for higher ra te toll q u ality system s.
A lthough p e rh a p s o p tim istic, in itial testin g is ‘scheduled’ for 1996, and s ta n d a rd ization in 1998. T h e p u sh for low ra te speech coding is still intense, w ith some cu rren t m ajo r d riving forces from th e LEO (Low E a rth O rb it) satellite applications (although these ap p lications do n o t a p p e ar to require toll quality).
2 .4
M o b ile C o m m u n ic a tio n s
M obile C o m m u nicatio ns has seen ra p id grow th over th e p ast ten years, and is well placed to experience even g rea ter grow th in th e next decade[55]. A lthough th e in d u stry sector is n o t w ith o u t its share of problem s, such as those of a political n a tu re , and those of som ew hat m ore w orrying h e a lth concerns[62], it appears th a t th e desire for tetherless personal com m unications is enough to fuel rapid technology advances in th e area.
Speech coding is an im p o rta n t com ponent of mobile com m unication system s. Of course, th ere are m any o th er com m unications and netw ork com ponents, as well as rela te d technology issues, an exam ple of which is b a tte ry design. O th er system aspects th a t are som ew hat less th e realm of engineering, b u t equally im p o rta n t, are device ergonom ics, an d m ark etin g approach. N aturally this thesis concentrates on speech coding, however a level of general knowledge ab o u t mobile com m unications system s is im p o rta n t, and assum ed of th e read er. Some general overview p apers are [118, 121]. However, m any o th er inform ative p ap ers exist.
2 .4 M o b ile C o m m u n ic a tio n s 19
frame erasures at the source coding level.
2.4.1
GSM (R PE-LTP)
The European Conference of Post and Telecommunications (CEPT) set up the Groupe Speciale Mobile (GSM) in 1982 to investigate a European wireless mobile system. The GSM system, now commonly taken to mean ‘Global System for Mobile’, was offered commercially in eight countries at the end of 1992, and in April 1993, 32 operators in 22 countries were committed to the system[7]. The system has also been adopted for use by Telecom (Telstra) Australia, and the two new mobile communications carriers in Australia, Optus and Vodafone.
The performance goal of the GSM system was to obtain an average quality no worse than th a t of the analog mobile systems. Of course, due to the speech coding distortion, the FM clean channel performance is not matched. However, the system is quite robust to error bursts.
LPC Analysis Residual Input Speech Long Term LPC Residual Prediction Reconstructed Residual Excitation Reconstructed LTP Analysis LAR Analysis Weighting Filter RPE Grid Selection Long Term Predictor Interpolate LAR LPC Analysis Filter RPE Grid Position Adaptive PCM Quantizer Processing LPC Residual
Figure 2.5: RPE-LTP (GSM) Encoder Block Schematic
The GSM system uses a Residual Pulse Excitation coder with Long Term Prediction (RPE-LTP) [120] at 13 kbps. A simplified block schematic of the encoder is presented in Figure 2.5. The RPE-LTP decoder is not shown.