Adaptive Data Compression
THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE COMMUNICATIONS AND INFORMATION TIlEORY
Consulting Editor:
Robert Gallager Other books in the series:
Digital Communication. Edward A. Lee, David G. Messerschmitt ISBN: 0-89838-274-2
An Introduction to Cryptology. Henk C.A. van Tilborg ISBN: 0-89838-271-8
Finite Fields for Computer Scientists and Engineers. Robert J. McEliece ISBN: 0-89838-191-6
An Introduction to Error Correcting Codes With Applications.
Scott A. Vanstone and Paul C. van Oorschot ISBN: 0-7923-9017-2
Source Coding Theory. Robert M. Gray ISBN: 0-7923-9048-2
Switching and TraffIC Theory for Integrated Broadband Networks.
Joseph Y Hui
ISBN: 0-7923-9061-X
ADAPI'IVE DATA COMPRESSION
by
Ross N. WiIliams Australian National
foreword by
Glen G. Langdon, Jr.
University of California, Santa Cruz
....
"
SPRINGER-SCIENCE+BUSINESS MEDIA, B.v.
Library of Congress CataIoging-in-Publication Data Williams, Ross N. (Ross Neil), 1962-
Adaptive data compression / by Ross N. Williams; foreword by Glen C. Langdon, lr.
p. cm. - (l'he Kluwer international series in engineering and computer science. Communications and information theory)
"A reproduction of a a Ph. D. thesis that was submitted to the Dept.
of Computer Science at the University of Adelaide on 30 June 1989"- -Aclrnowl.
Includes bibliographical references (p. ) and index.
ISBN 978-1-4613-6810-6 ISBN 978-1-4615-4046-5 (eBook) DOI 10.1007/978-1-4615-4046-5
1. Data compression (l'elecommunications) 1. Title. II. Series.
TK5102.5.W539 1991
005.74 '6-dc20 90-5059
CIP
Copyright © 1991 by Springer Science+Business Media Dordrecht Origina1ly published by Kluwer Academic Publishers in 1991
Softcover reprint ofthe hardcover Ist edition 1991
AlI rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher,
Springer-Science+Business Media, B.V.
Pr;nted on acid-free pa per.
Contents List of Figures List of Tables Foreword Preface
Acknowledgements Presentation Notes Abstract
Contents
Chapter 1: Introductory Survey 1.1 Introduction . . . .
1.1.1 Compression as Representation l.l.2 Data Compression in Ancient Greece l.1.3 Founding Work by Shannon . . . 1.2 Common Notation and Data Structures 1.3 The Problem of Data Compression
l.3.1 Real vs Quantized Information 1.3.2 Pure Blocking . . . . 1.3.3 Impure Blocking . . . . 1.3.4 A Classification of Algorithms . 1.4 Huffman Coding: Too Good Too Soon
1.4.1 Shannon-Fano Coding and Huffman Coding 1.4.2 Huffman as a Back-End Coder
1.4.3 Multi-Group Huffman Coding 1.4.4 Dynamic Huffman Coding 1.4.5 Recent Results . . . . 1.5 Thirty Years of Ad Hoc Data Compression
1.5.1 Run Length Coding . . . . 1.5.1.1 Binary Run Length Coding 1.5.1.2 Bit Vector Compression 1.5.2 Dictionary Techniques
1.5.2.1 Parsing Strategies
1.5.2.2 Static vs Semi-Adaptive Dictionaries 1.5.2.3 Early Dictionary Techniques . . . .
v
xi
xv
xvii
xix
xxi
xxiii
xxv
1
11
2
4
6
11
11
12
13
13
14
14
16
16
17
18
18
19
19 22
22
23
25
25
VI
Adaptive Data Compression
1.5.2.4 Later Dictionary Techniques 27
1.5.2.5 Compressing the Dictionary 28
1.5.3 Exploiting Specific Knowledge . . 29
1.5.4 Data Base Compression Techniques 29
1.5.5 The Practitioner's View 30
1.6 Adaptive Data Compression 32
1.7 Ziv and Lempel Algorithms 33
1. 7.1 Adaptive Dictionary Compression 33
1. 7.2 The Launch of LZ Compression 34
1. 7.3 LZ77 . . . . 35
1. 7.4 LZ77 Variants 37
1.7.5 LZ78 . . . . 37
1. 7.6 LZ78 Variants 39
1.7.7 Other Dynamic Dictionary Techniques 41
1.8 The Modern Paradigm of Data Compression 41
1.8.1 The Information Market Place 42
1.8.2 Predictions: A Currency of Information 1.8.3 The Modern Paradigm
1.8.4 Modelling . . . 1.8.5 Coding . . . . 1.8.6 Not So Modern A Paradigm 1.9 Arithmetic Coding . . . .
1.9.1 A Description of Arithmetic Coding 1.9.2 The Development of Arithmetic Coiling 1.9.3 The Popularization of Arithmetic Coding 1.9.4 Refinements of Arithmetic Coding . . . 1.10 Markov Models . . . .
1.10.1 Markov Chains, Markov Sources and Markov Models 1.10.2 Constructing Models from Histories
1.10.3 Estimation Techniques
1.10.3.1 Linear Estimation . . . 1.10.3.2 Non-Linear Estimation
1.10.3.3 Linear and Non-Linear Moffat Estimation 1.10.4 Summary of Fixed-Order Markov Methods 1.10.5 Variable-Order Techniques . . . . 1.10.6 An Overview of Markov Algorithms 1.10.6.1 Markov Sources by Shannon 1.10.6.2 DAFC .
1.10.6.3 FBA 1.10.6.4 LOEMA 1.10.6.5 UMC 1.10.6.6 PPM 1.10.6.7 DHPC 1.10.6.8 DMC 1.10.6.9 WORD
1.11 Data Structures For Predictions 1.11.1 Prediction Functionality 1.11.2 Linear Representations 1.11.3 Sparse Representations 1.11.4 Tree Representations
42
43
45
46
47
48
48
53
55
56
57
57
58
60
61
62
63
63
64
65
66 66
67
68
69
70
72
72
76
77
77
79
80
81
Contents
1.11.5 Heap Representations . . 1.11.6 Stochastic Representations 1.11.7 The Adaptive Q-Coder
1.11.8 A Comparison of Representations 1.11.9 Summary . . . . 1.12 Data Structures for Markov Trees
1.12.1 Hashing Representations 1.12.2 Backwards and Forwards Trees 1.13 Dictionary Methods vs Markov Methods 1.14 Signal Compression . . . . 1.15 Measures of Compression . . . .
1.16 Error Correction, Data Compression and Cryptography 1.17 Summary . . . .
Chapter 2: The DHPC Algorithm 2.1 Introduction
2.2 Tree Growth 2.3 Estimation .
2.4 The Algorithm in Detail 2.4.1 Main Program 2.4.2 Prediction Phase 2.4.3 Update Phase . 2.5 Example Execution of DHPC
2.6 A Comparison of DHPC with Other Algorithms 2.7 An Analysis of Tree Growth . . . . . 2.8 Summary . . . . Chapter 3: A Classification of Adaptivity
3.1 Introduction . . . . 3.2 Previous Definitions of Adaptivity 3.3 A Classification of Adaptivity 3.4 Adaptivity of Previous Algorithms 3.5 A Classification of Sources
3.6 A Comparison of Kinds of Adaptivity 3.7 Mechanisms for Adaptivity
3.7.1 Context Adaptivity 3.7.2 Structural Adaptivity 3.7.3 Metrics
OIlTree Structures 3.7.4 The Supply Problem . . . 3.7.5 The Demand Problem
3.7.6 Connecting Supply and Demand 3.8 Implementing Asymptotic Adaptivity 3.9 Implementing Local Adaptivity 3.10 Summary . . . .
Chapter 4: An Experimental Adaptive Algorithm 4.1 Introduction . . . .
4.2 Overview of the SAKDC Algorithm 4.3 Parameters of the SAKDC Algorithm
4.3.1 Tree Growth 4.3.2 The Supply System 4.3.3 Decaying . . .
VB
84
85
86
87
87
88
88 88
93
95
99
100
104
107
107 108
108
109
110
111
112
112
120
121
124
125
125
125
127
131
132 135
136
137
138
139
139
140
140
141
142
143
145
145
146
146
147
149
150
V111
Adaptive Data Compression
4.3.4 Windowed Local Adaptivity 151
4.3.5 Instance Management 152
4.3.6 Improving Efficiency . 152
4.3.7 Estimation . . . . . 153
4.4 Representation of Predictions 153
4.5 Representation of the History Buffer 156
4.6 Representation of the Tree 157
4.7 Maintaining LRU Information 160
4.7.1 The Two-Colour LRU Maintenance Problem 160 4.7.2 A Heap Implementation . . . 162 4.7.3 Implementing a Heap using Dynamic Data Structures 163
4.7.4 Boundary Problems 164
4.8 Deep Updating . . . . . 165
4.9 Saving Instances of Deleted Nodes 167
4.10 Shortcut Pointers 168
4.11 Deletion of Non-Leaf Nodes 173
4.12 Credibility Thresholds 174
4.13 Windowed Local Adaptivity 175
4.14 A Sketch of How an Instance is Processed 177
4.15 Controlling Implementation Complexity 179
4.16 Optimizations by Other Researchers 180
4.17 Experiments . . . . 182
4.17.1 SAKDC Implementation 182
4.17.2 Description of the Test Data 182
4.17.3 Presentation of Experiments 183
4.17.4 Experiment 1: Initial Benchmarks (DHPC, PPMA, PPMC') 186
4.17.5 Experiment 2: Estimation
(estim) 1894.17.6 Experiment 3: Credibility
(estim. threshold)196 4.17.7 Experiment 4: Deep Updating
(deeponly) 1994.17.8 Experiment 5: Depth of Tree
(maxdepth) 2014.17.9 Experiment 6: Memory Size
(maxnodes) .204 4.17.10 Experiment 7: Windowed Local Adaptivity
(local) 2074.17.11 Experiment 8: Decaying
(decay) . . . .210 4.17.12 Experiment 9: Threshold Growth
(grow) 2134.17.13 Experiment 10: Probabilistic Growth
(grow.probext)218 4.17.14 Experiment 11: Node Movement
(move)221 4.17.15 Experiment 12: Four Growth Regimes
(grow,move) 2254.17.16 Experiment 13: Inheriting Instances
(addback) 2304.17.17 Experiment 14: LRU Heap Management
(Iruparent)231 4.17.18 Experiment 15: Tree Reconstruction
(phoenix)233 4.17.19 Experiment 16: Shortcut Pointers
(shortcut.!) 2364.17.20 Experiment 17: Final Optimized Benchmark 239
4.17.21 Discussion . . . 242
4.18 Summary . . . . 243
Chapter 5: A Multimodal Algorithm 245
5.1 Introduction 245
5.2 Combining Models 245
5.3 MuItimodai Sources 247
5.4 Multimodal Models 248
5.5 Multi-Modal Algorithm Design Issues 248
Contents
5.5.1 Model Arbitration 5.5.2 Model Creation 5.5.3 Memory Allocation 5.6 The MMDC Algorithm
5.6.1 Overview . . . . 5.6.2 Parameters
5.6.3 A Formal Description 5.7 Experiments . . . .
5.7.1 Experiment 18: Artificial Data
5.7.2 Experiment 19: Real Files Concatenated 5.7.3 Experiment 20: Real Files Interleaved 5.7.4 Experiment 21: Effect of Memory 5.7.5 Discussion . . . . 5.8 Other Issues . . . . 5.8.1 MMDC is Really a Meta-Algorithm 5.8.2 Efficiency . . . . 5.8.3 The Use of Non-Adaptive Models 5.8.4 Heterogeneous MMDC
5.8.5 MMDC as Unifier 5.8.6 A Note on Security 5.9 Summary . . . .
Chapter 6: Applications to User Interfaces 6.1 Introduction . . . . . .
6.2 A New Application . . . 6.3 A Paradigm of User Prediction 6.4 Examples . . . .
6.4.1 Debugger Sessions . 6.4.2 Typist . . . . 6.4.3 Indenting a Program
6.5 Review of Interactive Environments 6.5.1 Inner and Outer Environments 6.5.2 General Design of Interactive Systems 6.5.3 Programmer's Helpers . . . . 6.5.4 Research into Command Languages 6.5.5 Modelling the User at the Keystroke Level 6.5.6 Modelling the User at the Command Level 6.5.7 Summary of Interactive Environment Work 6.6 Multi-Character Predictions
6.7 The Prediction Interface 6.7.1 Presenting Predictions 6.7.2 Confirming Predictions 6.8 Tuning Models . . . . . . 6.9 Human Factors . . . .
6.10 Work by Witten, Cleary and Darragh 6.11 Summary . . . . .
Chapter 7: Conclusions 7.1 Primary Contributions 7.2 Secondary Contributions 7.3 About Communication .
IX
248
249
250
250
251
252
253
258
260
265
269
272
273
276
277
277
278
278
279
279
280
283
283
283
284
285
286
286
287
287
287
289
290
292
292
293
293
294
296
296
297
298
299
300
303
305
305
306
307
x Adaptive Data Compression
7.4 Towards a More General Theory 7.5 Thesis Perspective . . . 7.6 Summary . . . . .
Appendix A: Estimation Formula Calculation Appendix B: The Word "Adapt" And Its Forms Appendix C: User Input Experiments . .
C.I Introduction . . . . C.2 Experiment 22: Redundancy of User Input C.3 Experiment 23: Timing Between Keystrokes Appendix D: Further Research
D.l The SAl-mC Algorithm D.2 The MMDC Algorithm D.3 User Interfaces . . . .
Appendix E: Summary of Notation References
Index
308 309 309 311 315 319 319 320 323 337 337 340
342
343
345
359
List of Figures
Figure 1: Shannon's model of communication . . . . Figure 2: A history buffer . . . . Figure 3: Backwards and forwards digital search trees Figure 4: Four kinds of blocking . . . .
Figure 5: Parsing problem mapped onto shortest path problem Figure 6: The LZ77 algorithm . . . . Figure 7: The LZ78 algorithm . . . . Figure 8: The LZ78 algorithm implemented using a tree Figure 9: The modern paradigm of data compression Figure 10: Shannon's compression paradigm
Figure 11: The coder's job is to fill output buckets Figure 12: Restricting ranges is the same as filling bits Figure 13: Addition of a new bucket extends the range Figure 14: Arithmetic coding using fixed precision registers Figure 15: The PPM estimation algorithm
Figure 16: DMC starting configuration Figure 17: DMC cloning operation
Figure 18: Cloning a node that points to itself
Figure 19: Erroneous tutorial example of DMC cloning
Figure 20: Problem with maintaining frequency sum information Figure 21: Prediction tree structure containing subtree sums Figure 22: Backwards and forwards digital search trees Figure 23: Performance of dictionary and Markov techniques Figure 24: Shannon's model of cryptography
Figure 25: Encryption without data compression Figure 26: Encryption with data compression Figure 27: DHPC main program
Figure 28: DHPC prediction function Figure 29: DHPC update procedure .
Figure 30: DHPC tree structure after 42 instances
Figure 31: DHPC minimum and maximum growth rates against
0Figure 32: Use of the history by four kinds of adaptivity
Figure 33: Example adaptive weighting curve . . . . Figure 34: A snapshot of DAFC in execution . . . . Figure 35: Tracking a moving source through simple source space Figure 36: Some interesting source trajectories . . . .
5
8
9
14
24
35
38
39
44 47
49
49
51
53
71
73
74
75
75
79
81
89
95
102
103
104
111 110
113119
124
127
129
131
134
135
xii Adaptive Data Compression
Figure 37: Summary of growth parameters 149
Figure 38: Semantics of decaying 151
Figure 39: The LAZY estimation algorithm 154
Figure 40: Direct implementation of a history buffer 156
Figure 41: Cyclic array implementation of a history buffer 157
Figure 42: A branching tree structure . . . . . 158
Figure 43: Branching using an array . . . . 159
Figure 44: Branching using an explicit branch tree 159
Figure 45: Branching using a branch tree built into the main tree 160
Figure 46: The structure of the heap . . . 163
Figure 47: Effect of deeponly updating on the matching branch 166
Figure 48: Effect of deeponly updating on node zones 167
Figure 49: Shortcut pointers in a solid tree structure 169
Figure 50: Shortcut pointers in a non-solid tree structure 170
Figure 51: Incremental shortcut pointer optimization 171
Figure 52: Incarnation numbers detect invalid shortcut pointers 172
Figure 53: Mechanism for a windowed local tree . . . 176
Figure 54: Steps of an SAKDC Update . . . . . 178
Figure 55: Experiment 2: Prop. Rem. vs ,\ for paperJ (wide angle) 190
Figure 56: Experiment 2: Prop.Rem. vs ,\ for paper1 (close up) 190
Figure 57: Experiment 2: Prop. Rem. vs ,\ for progc . . . . . 191
Figure 58: Experiment 2: Prop.Rem. vs ,\ for obj1 . . . 192
Figure 59: Experiment 2: Prop.Rem. vs ,\ for paperJ with local.period=500 193
Figure 60: Experiment 2: Prop.Rem. vs ,\ for paperJ with maxdepth=1 193
Figure 61: Experiment 2: Prop.Rem. vs ,\ for obj1 with maxdepth=1 194
Figure 62: Experiment 3: Prop. Rem. vs e3tim.thre3hold.thre3h for paper1 197
Figure 63: Experiment 3: Prop.Rem. vs estim.threshold.thresh for objt 198
Figure 64: Experiment 5: Prop. Rem. vs maxdepth for files . . . . . 202
Figure 65: Experiment 6: Prop.Rem. vs maxnode3 for files (close up) 205
Figure 66: Experiment 6: Prop.Rem. vs maxnodes for files (wide angle) 206
Figure 67: Experiment 7: Prop.Rem. vs local.period for some files . . . 208
Figure 68: Experiment 8: Prop.Rem. vs decay. threshold. thresh (wide angle) 211
Figure 69: Experiment 8: Prop.Rem. vs decay.threshold.thresh (close up) 212
Figure 70: Experiment 9: Prop.Rem. vs grow.threshold.thresh(sum), paperJ 214
Figure 71: Experiment 9: Prop.Rem. vs grow.threshold.thresh(max), paperJ 215
Figure 72: Experiment 9: Prop. Rem. vs grow. threshold. thresh for obj1 216
Figure 73: Experiment 9: Prop. Rem. vs grow. threshold. thresh for
tran,~217
Figure 74: Experiment 10: Prop.Rem. vs grow.probext for paperJ 219
Figure 75: Experiment 10: Prop.Rem. vs grow.probext for objt 220
Figure 76: Experiment 10: Prop. Rem. vs grow.probext for trans 220
Figure 77: Experiment 11: Prop.Rem. vs move.thre3hold.thresh for paperJ 222
Figure 78: Experiment 11: Prop.Rem. vs move. threshold. thresh for objl 223
Figure 79: Experiment 11: Prop.Rem. vs move. threshold. thresh for objpap 224
Figure 80: Experiment 12: Prop.Rem. vs maxnodes for paperJ (3 regimes) 226
Figure 81: Experiment 12: Prop.Rem. vs maxnodes for objl (3 regimes) 227
Figure 82: Experiment 12: Prop.Rem. vs maxnodes for objpap (3 regimes) 228
Figure 83: Experiment 12: Prop.Rem. vs maxnode3 for book1 [1..100000] 229
Figure 84: Experiment 15; Prop.Rem. vs maxnodes for objt with phoenix 234
Figure 85: Experiment 15: Prop.Rem. vs maxnodes for oblpap with phoenix 235
Figure 86: Experiment 15: Prop.Rem. vs maxnodes. bookl [1..100000]( +ph) 235
List of Figures
XlllFigure 87: Combining models . . . 246
Figure 88: Some interesting source trajectories 247
Figure 89: MMDC main program 254
Figure 90: MMDC model management 255
Figure 91: MMDC code scraps 256
Figure 92: State diagram for the MMDC algorithm 256
Figure 93: Prop. Rem. vs time for MMDC on artificial data (multi)
261Figure 94: Prop.Rem. vs time for MMDC on concatenated files (concat)
266Figure 95: Prop.Rem. vs time for MMDC on real interleaved data (inter)
270Figure 96: Performance of MMDC and PPMC' on the multimodal data 273
Figure 97: The tree zone explanation of MMDC's performance 275
Figure 98: The modern paradigm modified for user interfaces 284
Figure 99: The modern paradigm of data compression . . 285
Figure 100: Inner and outer environments . . . . 288
Figure 101: Cost of typing and programming vs repetition 302
Figure 102: Set of all substrings in a message . . . . 320
Figure 103: Experiment 22: Percentage of lengths in raw input 321
Figure 104: Experiment 22: Percentage of lengths in filtered input 322
Figure 105: Experiment 22: Percentage of lengths in random letters 322
Figure 106: Experiment 23: Timing tokenizer algorithm . . . . . 327
List of Tables
Table 1: Examples of everyday data compression Table 2: Example dictionary . . . . . . . Table 3: Tree mappings and their names Table 4: Compression performance measures
Table 5: Proposed nomenclature for performance measures Table 6: Parameters used in the DHPC example
Table 7: Summary of SAKDC parameters
Table 8: The corpus of data files used to test SAKDC Table 9: Experiment 1: Benchmark parameters for DHPC Table 10: Experiment 1: Benchmark parameters for PPMA Table 11: Experiment 1: Benchmark parameters for PPMC'
Table 12: Experiment 1: Benchmarks for DHPC, PPMA and PPMC' Table 13: Experiment 2: Estimation base parameters
Table 14: Experiment 2: Symbol and instance densities Table 15: Experiment 3: Credibility base parameters Table 16: Experiment 4: Deep updating base parameters Table 17: Experiment 4: Deep updating results
Table 18: Experiment 5: Depth base parameters
Table 19: Experiment 5: Rate of node growth for different depths Table 20: Experiment 5: Memory run out times for different depths Table 21: Experiment 6: Memory base parameters . . . Table 22: Experiment 7: Windowed adaptivity base parameters Table 23: Experiment 8: Decay base parameters . . . . Table 24: Experiment 9: Threshold growth base parameters Table 25: Experiment 10: Probabilistic growth base parameters Table 26: Experiment 11: Node movement base parameters Table 27: Experiment 12: Growth regimes base parameters . . Table 28: Experiment 12: Four growth regimes . . . . Table 29: Experiment 13: Inheriting instances base parameters Table 30: Experiment 13: Addback results . . . . Table 31: Experiment 14: Heap management base parameters Table 32: Experiment 14: Heap management results . . . . Table 33: Experiment 15: Tree reconstruction base parameters Table 34: Experiment 16: Shortcut base parameters
Table 35: Experiment 16: Shortcuts results . Table 36: Experiment 16: Prediction statistics
. 4
23
91
100
101
113
147
184
186
186
186
187
189
194
196
199
200
201
202
203
204
207
210
213
218
221
225
225
230
230
231
232
233
236
237 238
XVI
Adaptive Data Compression
Table 37: Experiment 17: SAKDC(Optl) parameters . . . 239
Table 38: Experiment 17: Benchmark parameters for PPMC' . . . . 240
Table 39: Experiment 17: Benchmarks for SAKDC(Optl) and PPMC' 241
Table 40: Multimodal parameters used in the multimodal experiments 258
Table 41: SAKDC parameters used for the local model . . . . 259
Table 42: SAKDC parameters used for the ordinary models 259
Table 43: Performance of MMDC on the artificial source (multi) 264
Table 44: Mode detection performance for multi . . . 264
Table 45: Mode detection performance for concat . . . . 267
Table 46: Performance of MMDC on concatenated files (concat) 268
Table 47: Mode detection performance for mter . . . . 269
Table 48: Performance of MMDC on real interleaved data (inter) 271
Table 49: Performance of MMDC with increased memory 272
Table 50: The various forms of the word "adapt" 317
Table 51: Forms of the word "adapt" arranged in grammatical slots 318
Foreword
Following an exchange of correspondence, I met Ross in Adelaide in June 1988. I was approached by the University of Adelaide about being an external examiner for this dissertation and willingly agreed. Upon receiving a copy of this work, what struck me most was the scholarship with which Ross approaches and advances this relatively new field of adaptive data compression. This scholarship, coupled with the ability to express himself clearly using figures, tables, and incisive prose, demanded that Ross's dissertation be given a wider audience. And so this thesis was brought to the attention of Kluwer.
The modern data compression paradigm furthered by this work is based upon the separation of adaptive context modelling, adaptive statistics, and arithmetic coding. This work offers the most complete bibliography on this subject I am aware of. It provides an excellent and lucid review of the field, and should be equally as beneficial to newcomers as to those of us already in the field.
A paper "Modeling and Coding" by Rissanen and Langdon lays some theoretical groundwork to adaptivity, and Ross extends this work by further categorizing the notion of causal adaptation. In so doing, he provides insights into possible implementations. Ross's introduction also comments on terminology problems. For example, the term "compres- sion ratio" has no standard meaning between authors. Instead of arguing what compression ratio "really means", Ross proposes new terms that imply how the compression performance is calculated. For example, if value x is identified as the "proportion remaining", the reader immedi- ately knows that the number of bytes in the compressed file is x times the number of bytes in the uncompressed file.
The capability to adapt to the behaviour of a zero-order Markov
source opens many doors, since a variable-order Markov source is a col-
lection of independent zero-order sources. The arithmetic coding compo-
nent of the modern compression paradigm employs relative frequencies
xviii Adaptive Data Compression directly as opposed to older techniques that must first map the frequen- cies onto sets of integer-length code words possessing the prefix property.
Ross points out the historical interest of an algorithm called DAFC (Dou- ble Adaptive File Compression) that first adapts to Markov contexts and then adapts a probability distribution for each context. Ross's algorithm of Chapter 2, DHPC (Dynamic History Predictive Compression), does the kind of double adaptation that DAFC would have liked to have done.
DHPC, Ross's opening contribution and the basis for the three following chapters, is a valuable original contribution in its own right. A backwards tree of maximum depth m represents the more frequent variable-order contexts up to Markov order m. The algorithm description is generic, and can be used for experimentation. Designer-selected parameters de- termine if a node in the tree has enough activity to serve as a context, or enough activity to spawn deeper nodes.
The DHPC algorithm leads to the Swiss Army Knife Data Compres- sion (SAKDC) algorithm. SAKDC accommodates, through parameter selection, the DHPC algorithm as a special case. Ross defines differ- ent forms and mechanisms for adaptivity of variable-order models, and SAKDC provides an implementation environment for experimenting with them. The experiments performed provide a page of conclusions regard- ing what seems to work and what doesn't work in variable-order Markov models.
A very interesting aspect of the work is the identification and investigation of multimodal sources. A cylinder of a disk or a magnetic tape of data may contain Pascal programs, hex dumps, and text information. These three types of data each correspond to a source mode. For each source mode, the compression algorithm should employ a model appropriate for that mode. Multi-Modal algorithms switch between models as the source jumps from one mode to another. The MMDC (Multi-Modal Data Compression) algorithm is based upon the above approach, and an implementation satisfactorily switches between models. Ross observes that MMDC illustrates science building upon itself; MMDC unifies many concepts from the modern paradigm of data compression that were developed without multi-modal compression in mind.
Glen G. Langdon, Jr.
Computer Engineering Department
University of California, Santa Cruz
June 1990
Preface
This book is identical to my Ph.D. thesis (submitted to the University of Adelaide on 30 June 1989) except that it has been re-typeset and a few minor corrections have been made. These are listed in the presentation notes.
The thesis reviews the field of text data compression (Chapter 1) and then addresses the problem of compressing rapidly changing data streams (Chapter 2-7). To compress such streams, the compressor must continuously revise its model of the data (Chapter 2-4), and sometimes even discard its model in favour of a new one (Chapter 5).
Although the techniques involved in data compression are often complicated, the ideas are simple, and throughout the thesis I have endeavoured to keep them so. The reader should be able to tackle the thesis with no knowledge of data compression.
Ross N. Williams June 1990
Adelaide, Australia
Acknow ledgements
This monograph is a reproduction of a Ph.D. thesis that was submitted to the Department of Computer Science at the University of Adelaide on 30 June 1989 and I am grateful to all those who supported me during my candidature. In particular I am very grateful to my supervisor, Dr. William Beaumont for his guidance and support. I am also grateful to my temporary supervisor Dr. Tao Li for his help while Dr. Beaumont was on study leave in 1987.
The research that led to the thesis was supported in part by a Common- wealth of Australia Postgraduate Research Award.
Thanks is recorded to those whose informal communications were quoted
in the thesis, as all have given additional permission to be quoted in this
book.
Presentation Notes
References: All references are cited in the form [<jirstauthor> < year>].
Citations are set in bold face the first time they appear and ordinary face thereafter. All references cited in the text appear in the reference list and the index. Citations of references in the same year by the same author appear identical.
Special terms: New or important terminology has been set in bold face and appears in the index.
Tables and Figures: Unless otherwise stated, all tables and diagrams appearing in this thesis were created specifically for the thesis and are original to the thesis. All references to tables and figures in the text are set in bold face so that the text describing any particular table or figure can be found quickly.
Captions: Every table and figure in the thesis has a title which summarizes the diagram and appears in the table of contents. In addition, some tables and figures have a caption (a paragraph of descriptive text). Captions have been included mainly to aid the casual reader.
Intra-document references: References to entities within this thesis appear with the first letter in upper case (e.g. "Section 78.5"). References to entities in other documents appear with the first letter in lower case (e.g. "section 78.5").
Italics: Italics is used both for emphasis and for program and algorithm identifiers.
Notation: Section 1.2 describes much of the notation and terminology
of the thesis. Appendix E contains a summary of the mathematical
notation used in this thesis.
xxiv Adaptive Data Compression Typesetting: This book was typeset using the TEX[Knuth79][Kn- uth84] typesetting system and was reproduced photographically by Kluwer Academic Press.
Graphics: Most of the diagrams of this thesis were generated on a Macintosh
lcomputer using the program MacDraw.
2Histograms were produced on a Macintosh using the program Excel.
3The tree diagrams of Chapter 2 were plotted from PostScript
4code generated directly by a DHPC program. The graphs in Chapter 4 and Chapter 5 were generated by a plotting program developed by the author. The "range frame" style of these graphs was inspired by the graphic design book [Tufte83] (esp.
p.132).
Corrections
This book is identical to my Ph.D. thesis except that it has been re-typeset and corrected. The corrections are:
Front matter: The front matter was reworked to suit the form of a book, and the foreword and preface were added.
Chapter 1: A note was appended to footnote 18 (Section 1.3.1) and the word "bits" in the same footnote was set in boldface and entered into the index. A copyright notice was added to Figure 10.
Chapter 4: The title of Table 39 was corrected.
Conclusions: A closing parenthesis was inserted in the fifth primary contribution paragraph.
Appendix A: Reference to Jones in the second paragraph was made less ambiguous. Typographical errors in the second and last equations of the proof were corrected.
Appendix C: "two pages" was changed to "few pages".
Appendix E: The definition of M was altered.
Detected remaining errors: The word "wooloomooloo" used through- out the thesis as an example message is a misspelling of the famous suburb of Sydney, Australia. The correct spelling is "Woolloomooloo".
Enthusiastic readers way wish, as an exercise, to re-work all the examples using the correct spelling.
1
Macintosh is a trademark of Apple Computer, Inc.
2
MacDraw is a trademark of Claris Corporation.
3 Excel is a trademark of Microsoft Corporation.