Adaptive Data Compression

(1)

Adaptive Data Compression

(2)

THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE COMMUNICATIONS AND INFORMATION TIlEORY

Consulting Editor:

Robert Gallager Other books in the series:

Digital Communication. Edward A. Lee, David G. Messerschmitt ISBN: 0-89838-274-2

An Introduction to Cryptology. Henk C.A. van Tilborg ISBN: 0-89838-271-8

Finite Fields for Computer Scientists and Engineers. Robert J. McEliece ISBN: 0-89838-191-6

An Introduction to Error Correcting Codes With Applications.

Scott A. Vanstone and Paul C. van Oorschot ISBN: 0-7923-9017-2

Source Coding Theory. Robert M. Gray ISBN: 0-7923-9048-2

Switching and TraffIC Theory for Integrated Broadband Networks.

Joseph Y Hui

ISBN: 0-7923-9061-X

(3)

ADAPI'IVE DATA COMPRESSION

by

Ross N. WiIliams Australian National

foreword by

Glen G. Langdon, Jr.

University of California, Santa Cruz

....

"

SPRINGER-SCIENCE+BUSINESS MEDIA, B.v.

(4)

Library of Congress CataIoging-in-Publication Data Williams, Ross N. (Ross Neil), 1962-

Adaptive data compression / by Ross N. Williams; foreword by Glen C. Langdon, lr.

p. cm. - (l'he Kluwer international series in engineering and computer science. Communications and information theory)

"A reproduction of a a Ph. D. thesis that was submitted to the Dept.

of Computer Science at the University of Adelaide on 30 June 1989"- -Aclrnowl.

Includes bibliographical references (p. ) and index.

ISBN 978-1-4613-6810-6 ISBN 978-1-4615-4046-5 (eBook) DOI 10.1007/978-1-4615-4046-5

1. Data compression (l'elecommunications) 1. Title. II. Series.

TK5102.5.W539 1991

005.74 '6-dc20 90-5059

CIP

Copyright © 1991 by Springer Science+Business Media Dordrecht Origina1ly published by Kluwer Academic Publishers in 1991

Softcover reprint ofthe hardcover Ist edition 1991

AlI rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher,

Springer-Science+Business Media, B.V.

Pr;nted on acid-free pa per.

(5)

Contents List of Figures List of Tables Foreword Preface

Acknowledgements Presentation Notes Abstract

Chapter 1: Introductory Survey 1.1 Introduction . . . .

1.1.1 Compression as Representation l.l.2 Data Compression in Ancient Greece l.1.3 Founding Work by Shannon . . . 1.2 Common Notation and Data Structures 1.3 The Problem of Data Compression

l.3.1 Real vs Quantized Information 1.3.2 Pure Blocking . . . . 1.3.3 Impure Blocking . . . . 1.3.4 A Classification of Algorithms . 1.4 Huffman Coding: Too Good Too Soon

1.4.1 Shannon-Fano Coding and Huffman Coding 1.4.2 Huffman as a Back-End Coder

1.4.3 Multi-Group Huffman Coding 1.4.4 Dynamic Huffman Coding 1.4.5 Recent Results . . . . 1.5 Thirty Years of Ad Hoc Data Compression

1.5.1 Run Length Coding . . . . 1.5.1.1 Binary Run Length Coding 1.5.1.2 Bit Vector Compression 1.5.2 Dictionary Techniques

1.5.2.1 Parsing Strategies

1.5.2.2 Static vs Semi-Adaptive Dictionaries 1.5.2.3 Early Dictionary Techniques . . . .

v

xi

xv

xvii

xix

xxi

xxiii

xxv

1

2

4

6

11

12

13

14

16

17

18

19 19 22

22

23

25

(6)

VI

Adaptive Data Compression

1.5.2.4 Later Dictionary Techniques 27

1.5.2.5 Compressing the Dictionary 28

1.5.3 Exploiting Specific Knowledge . . 29

1.5.4 Data Base Compression Techniques 29

1.5.5 The Practitioner's View 30

1.6 Adaptive Data Compression 32

1.7 Ziv and Lempel Algorithms 33

1. 7.1 Adaptive Dictionary Compression 33

1. 7.2 The Launch of LZ Compression 34

1. 7.3 LZ77 . . . . 35

1. 7.4 LZ77 Variants 37

1.7.5 LZ78 . . . . 37

1. 7.6 LZ78 Variants 39

1.7.7 Other Dynamic Dictionary Techniques 41

1.8 The Modern Paradigm of Data Compression 41

1.8.1 The Information Market Place 42

1.8.2 Predictions: A Currency of Information 1.8.3 The Modern Paradigm

1.8.4 Modelling . . . 1.8.5 Coding . . . . 1.8.6 Not So Modern A Paradigm 1.9 Arithmetic Coding . . . .

1.9.1 A Description of Arithmetic Coding 1.9.2 The Development of Arithmetic Coiling 1.9.3 The Popularization of Arithmetic Coding 1.9.4 Refinements of Arithmetic Coding . . . 1.10 Markov Models . . . .

1.10.1 Markov Chains, Markov Sources and Markov Models 1.10.2 Constructing Models from Histories

1.10.3 Estimation Techniques

1.10.3.1 Linear Estimation . . . 1.10.3.2 Non-Linear Estimation

1.10.3.3 Linear and Non-Linear Moffat Estimation 1.10.4 Summary of Fixed-Order Markov Methods 1.10.5 Variable-Order Techniques . . . . 1.10.6 An Overview of Markov Algorithms 1.10.6.1 Markov Sources by Shannon 1.10.6.2 DAFC .

1.10.6.3 FBA 1.10.6.4 LOEMA 1.10.6.5 UMC 1.10.6.6 PPM 1.10.6.7 DHPC 1.10.6.8 DMC 1.10.6.9 WORD

1.11 Data Structures For Predictions 1.11.1 Prediction Functionality 1.11.2 Linear Representations 1.11.3 Sparse Representations 1.11.4 Tree Representations

42

43

45

46

47

48

53

55

56

57

58

60

61

62

63

64

65 66 66

67

68

69

70

72

76

77

79

80

81

(7)

1.11.5 Heap Representations . . 1.11.6 Stochastic Representations 1.11.7 The Adaptive Q-Coder

1.11.8 A Comparison of Representations 1.11.9 Summary . . . . 1.12 Data Structures for Markov Trees

1.12.1 Hashing Representations 1.12.2 Backwards and Forwards Trees 1.13 Dictionary Methods vs Markov Methods 1.14 Signal Compression . . . . 1.15 Measures of Compression . . . .

1.16 Error Correction, Data Compression and Cryptography 1.17 Summary . . . .

Chapter 2: The DHPC Algorithm 2.1 Introduction

2.2 Tree Growth 2.3 Estimation .

2.4 The Algorithm in Detail 2.4.1 Main Program 2.4.2 Prediction Phase 2.4.3 Update Phase . 2.5 Example Execution of DHPC

2.6 A Comparison of DHPC with Other Algorithms 2.7 An Analysis of Tree Growth . . . . . 2.8 Summary . . . . Chapter 3: A Classification of Adaptivity

3.1 Introduction . . . . 3.2 Previous Definitions of Adaptivity 3.3 A Classification of Adaptivity 3.4 Adaptivity of Previous Algorithms 3.5 A Classification of Sources

3.6 A Comparison of Kinds of Adaptivity 3.7 Mechanisms for Adaptivity

3.7.1 Context Adaptivity 3.7.2 Structural Adaptivity 3.7.3 Metrics

OIl

Tree Structures 3.7.4 The Supply Problem . . . 3.7.5 The Demand Problem

3.7.6 Connecting Supply and Demand 3.8 Implementing Asymptotic Adaptivity 3.9 Implementing Local Adaptivity 3.10 Summary . . . .

Chapter 4: An Experimental Adaptive Algorithm 4.1 Introduction . . . .

4.2 Overview of the SAKDC Algorithm 4.3 Parameters of the SAKDC Algorithm

4.3.1 Tree Growth 4.3.2 The Supply System 4.3.3 Decaying . . .

VB

84

85

86

87

88 88 88

93

95

99

100

104

107 107 108

108

109

110

111

112

120

121

124

125

127

131 132 135

136

137

138

139

140

141

142

143

145

146

147

149

150

(8)

V111

Adaptive Data Compression

4.3.4 Windowed Local Adaptivity 151

4.3.5 Instance Management 152

4.3.6 Improving Efficiency . 152

4.3.7 Estimation . . . . . 153

4.4 Representation of Predictions 153

4.5 Representation of the History Buffer 156

4.6 Representation of the Tree 157

4.7 Maintaining LRU Information 160

4.7.1 The Two-Colour LRU Maintenance Problem 160 4.7.2 A Heap Implementation . . . 162 4.7.3 Implementing a Heap using Dynamic Data Structures 163

4.7.4 Boundary Problems 164

4.8 Deep Updating . . . . . 165

4.9 Saving Instances of Deleted Nodes 167

4.10 Shortcut Pointers 168

4.11 Deletion of Non-Leaf Nodes 173

4.12 Credibility Thresholds 174

4.13 Windowed Local Adaptivity 175

4.14 A Sketch of How an Instance is Processed 177

4.15 Controlling Implementation Complexity 179

4.16 Optimizations by Other Researchers 180

4.17 Experiments . . . . 182

4.17.1 SAKDC Implementation 182

4.17.2 Description of the Test Data 182

4.17.3 Presentation of Experiments 183

4.17.4 Experiment 1: Initial Benchmarks (DHPC, PPMA, PPMC') 186

4.17.5 Experiment 2: Estimation

(estim) 189

4.17.6 Experiment 3: Credibility

(estim. threshold)

196 4.17.7 Experiment 4: Deep Updating

(deeponly) 199

4.17.8 Experiment 5: Depth of Tree

(maxdepth) 201

4.17.9 Experiment 6: Memory Size

(maxnodes) .

204 4.17.10 Experiment 7: Windowed Local Adaptivity

(local) 207

4.17.11 Experiment 8: Decaying

(decay) . . . .

210 4.17.12 Experiment 9: Threshold Growth

(grow) 213

4.17.13 Experiment 10: Probabilistic Growth

(grow.probext)

218 4.17.14 Experiment 11: Node Movement

(move)

221 4.17.15 Experiment 12: Four Growth Regimes

(grow,move) 225

4.17.16 Experiment 13: Inheriting Instances

(addback) 230

4.17.17 Experiment 14: LRU Heap Management

(Iruparent)

231 4.17.18 Experiment 15: Tree Reconstruction

(phoenix)

233 4.17.19 Experiment 16: Shortcut Pointers

(shortcut.!) 236

4.17.20 Experiment 17: Final Optimized Benchmark 239

4.17.21 Discussion . . . 242

4.18 Summary . . . . 243

Chapter 5: A Multimodal Algorithm 245

5.1 Introduction 245

5.2 Combining Models 245

5.3 MuItimodai Sources 247

5.4 Multimodal Models 248

5.5 Multi-Modal Algorithm Design Issues 248

(9)

5.5.1 Model Arbitration 5.5.2 Model Creation 5.5.3 Memory Allocation 5.6 The MMDC Algorithm

5.6.1 Overview . . . . 5.6.2 Parameters

5.6.3 A Formal Description 5.7 Experiments . . . .

5.7.1 Experiment 18: Artificial Data

5.7.2 Experiment 19: Real Files Concatenated 5.7.3 Experiment 20: Real Files Interleaved 5.7.4 Experiment 21: Effect of Memory 5.7.5 Discussion . . . . 5.8 Other Issues . . . . 5.8.1 MMDC is Really a Meta-Algorithm 5.8.2 Efficiency . . . . 5.8.3 The Use of Non-Adaptive Models 5.8.4 Heterogeneous MMDC

5.8.5 MMDC as Unifier 5.8.6 A Note on Security 5.9 Summary . . . .

Chapter 6: Applications to User Interfaces 6.1 Introduction . . . . . .

6.2 A New Application . . . 6.3 A Paradigm of User Prediction 6.4 Examples . . . .

6.4.1 Debugger Sessions . 6.4.2 Typist . . . . 6.4.3 Indenting a Program

6.5 Review of Interactive Environments 6.5.1 Inner and Outer Environments 6.5.2 General Design of Interactive Systems 6.5.3 Programmer's Helpers . . . . 6.5.4 Research into Command Languages 6.5.5 Modelling the User at the Keystroke Level 6.5.6 Modelling the User at the Command Level 6.5.7 Summary of Interactive Environment Work 6.6 Multi-Character Predictions

6.7 The Prediction Interface 6.7.1 Presenting Predictions 6.7.2 Confirming Predictions 6.8 Tuning Models . . . . . . 6.9 Human Factors . . . .

6.10 Work by Witten, Cleary and Darragh 6.11 Summary . . . . .

Chapter 7: Conclusions 7.1 Primary Contributions 7.2 Secondary Contributions 7.3 About Communication .

IX

248

249

250

251

252

253

258

260

265

269

272

273

276

277

278

279

280

283

284

285

286

287

289

290

292

293

294

296

297

298

299

300

303

305

306

307

(10)

x Adaptive Data Compression

7.4 Towards a More General Theory 7.5 Thesis Perspective . . . 7.6 Summary . . . . .

Appendix A: Estimation Formula Calculation Appendix B: The Word "Adapt" And Its Forms Appendix C: User Input Experiments . .

C.I Introduction . . . . C.2 Experiment 22: Redundancy of User Input C.3 Experiment 23: Timing Between Keystrokes Appendix D: Further Research

D.l The SAl-mC Algorithm D.2 The MMDC Algorithm D.3 User Interfaces . . . .

Appendix E: Summary of Notation References

Index

308 309 309 311 315 319 319 320 323 337 337 340

342

343

345

359

(11)

List of Figures

Figure 1: Shannon's model of communication . . . . Figure 2: A history buffer . . . . Figure 3: Backwards and forwards digital search trees Figure 4: Four kinds of blocking . . . .

Figure 5: Parsing problem mapped onto shortest path problem Figure 6: The LZ77 algorithm . . . . Figure 7: The LZ78 algorithm . . . . Figure 8: The LZ78 algorithm implemented using a tree Figure 9: The modern paradigm of data compression Figure 10: Shannon's compression paradigm

Figure 11: The coder's job is to fill output buckets Figure 12: Restricting ranges is the same as filling bits Figure 13: Addition of a new bucket extends the range Figure 14: Arithmetic coding using fixed precision registers Figure 15: The PPM estimation algorithm

Figure 16: DMC starting configuration Figure 17: DMC cloning operation

Figure 18: Cloning a node that points to itself

Figure 19: Erroneous tutorial example of DMC cloning

Figure 20: Problem with maintaining frequency sum information Figure 21: Prediction tree structure containing subtree sums Figure 22: Backwards and forwards digital search trees Figure 23: Performance of dictionary and Markov techniques Figure 24: Shannon's model of cryptography

Figure 25: Encryption without data compression Figure 26: Encryption with data compression Figure 27: DHPC main program

Figure 28: DHPC prediction function Figure 29: DHPC update procedure .

Figure 30: DHPC tree structure after 42 instances

Figure 31: DHPC minimum and maximum growth rates against

0

Figure 32: Use of the history by four kinds of adaptivity

Figure 33: Example adaptive weighting curve . . . . Figure 34: A snapshot of DAFC in execution . . . . Figure 35: Tracking a moving source through simple source space Figure 36: Some interesting source trajectories . . . .

5

8

9

14

24

35

38

39 44 47

49

51

53

71

73

74

75

79

81

89

95

102

103

104 111 110

113

119

124

127

129

131

134

135

(12)

xii Adaptive Data Compression

Figure 37: Summary of growth parameters 149

Figure 38: Semantics of decaying 151

Figure 39: The LAZY estimation algorithm 154

Figure 40: Direct implementation of a history buffer 156

Figure 41: Cyclic array implementation of a history buffer 157

Figure 42: A branching tree structure . . . . . 158

Figure 43: Branching using an array . . . . 159

Figure 44: Branching using an explicit branch tree 159

Figure 45: Branching using a branch tree built into the main tree 160

Figure 46: The structure of the heap . . . 163

Figure 47: Effect of deeponly updating on the matching branch 166

Figure 48: Effect of deeponly updating on node zones 167

Figure 49: Shortcut pointers in a solid tree structure 169

Figure 50: Shortcut pointers in a non-solid tree structure 170

Figure 51: Incremental shortcut pointer optimization 171

Figure 52: Incarnation numbers detect invalid shortcut pointers 172

Figure 53: Mechanism for a windowed local tree . . . 176

Figure 54: Steps of an SAKDC Update . . . . . 178

Figure 55: Experiment 2: Prop. Rem. vs ,\ for paperJ (wide angle) 190

Figure 56: Experiment 2: Prop.Rem. vs ,\ for paper1 (close up) 190

Figure 57: Experiment 2: Prop. Rem. vs ,\ for progc . . . . . 191

Figure 58: Experiment 2: Prop.Rem. vs ,\ for obj1 . . . 192

Figure 59: Experiment 2: Prop.Rem. vs ,\ for paperJ with local.period=500 193

Figure 60: Experiment 2: Prop.Rem. vs ,\ for paperJ with maxdepth=1 193

Figure 61: Experiment 2: Prop.Rem. vs ,\ for obj1 with maxdepth=1 194

Figure 62: Experiment 3: Prop. Rem. vs e3tim.thre3hold.thre3h for paper1 197

Figure 63: Experiment 3: Prop.Rem. vs estim.threshold.thresh for objt 198

Figure 64: Experiment 5: Prop. Rem. vs maxdepth for files . . . . . 202

Figure 65: Experiment 6: Prop.Rem. vs maxnode3 for files (close up) 205

Figure 66: Experiment 6: Prop.Rem. vs maxnodes for files (wide angle) 206

Figure 67: Experiment 7: Prop.Rem. vs local.period for some files . . . 208

Figure 68: Experiment 8: Prop.Rem. vs decay. threshold. thresh (wide angle) 211

Figure 69: Experiment 8: Prop.Rem. vs decay.threshold.thresh (close up) 212

Figure 70: Experiment 9: Prop.Rem. vs grow.threshold.thresh(sum), paperJ 214

Figure 71: Experiment 9: Prop.Rem. vs grow.threshold.thresh(max), paperJ 215

Figure 72: Experiment 9: Prop. Rem. vs grow. threshold. thresh for obj1 216

Figure 73: Experiment 9: Prop. Rem. vs grow. threshold. thresh for

tran,~

217 Figure 74: Experiment 10: Prop.Rem. vs grow.probext for paperJ 219

Figure 75: Experiment 10: Prop.Rem. vs grow.probext for objt 220

Figure 76: Experiment 10: Prop. Rem. vs grow.probext for trans 220

Figure 77: Experiment 11: Prop.Rem. vs move.thre3hold.thresh for paperJ 222

Figure 78: Experiment 11: Prop.Rem. vs move. threshold. thresh for objl 223

Figure 79: Experiment 11: Prop.Rem. vs move. threshold. thresh for objpap 224

Figure 80: Experiment 12: Prop.Rem. vs maxnodes for paperJ (3 regimes) 226

Figure 81: Experiment 12: Prop.Rem. vs maxnodes for objl (3 regimes) 227

Figure 82: Experiment 12: Prop.Rem. vs maxnodes for objpap (3 regimes) 228

Figure 83: Experiment 12: Prop.Rem. vs maxnode3 for book1 [1..100000] 229

Figure 84: Experiment 15; Prop.Rem. vs maxnodes for objt with phoenix 234

Figure 85: Experiment 15: Prop.Rem. vs maxnodes for oblpap with phoenix 235

Figure 86: Experiment 15: Prop.Rem. vs maxnodes. bookl [1..100000]( +ph) 235

(13)

List of Figures

^Xlll

Figure 87: Combining models . . . 246

Figure 88: Some interesting source trajectories 247

Figure 89: MMDC main program 254

Figure 90: MMDC model management 255

Figure 91: MMDC code scraps 256

Figure 92: State diagram for the MMDC algorithm 256

Figure 93: Prop. Rem. vs time for MMDC on artificial data (multi)

261

Figure 94: Prop.Rem. vs time for MMDC on concatenated files (concat)

266

Figure 95: Prop.Rem. vs time for MMDC on real interleaved data (inter)

²⁷⁰

Figure 96: Performance of MMDC and PPMC' on the multimodal data 273

Figure 97: The tree zone explanation of MMDC's performance 275

Figure 98: The modern paradigm modified for user interfaces 284

Figure 99: The modern paradigm of data compression . . 285

Figure 100: Inner and outer environments . . . . 288

Figure 101: Cost of typing and programming vs repetition 302

Figure 102: Set of all substrings in a message . . . . 320

Figure 103: Experiment 22: Percentage of lengths in raw input 321

Figure 104: Experiment 22: Percentage of lengths in filtered input 322

Figure 105: Experiment 22: Percentage of lengths in random letters 322

Figure 106: Experiment 23: Timing tokenizer algorithm . . . . . 327

(14)

List of Tables

Table 1: Examples of everyday data compression Table 2: Example dictionary . . . . . . . Table 3: Tree mappings and their names Table 4: Compression performance measures

Table 5: Proposed nomenclature for performance measures Table 6: Parameters used in the DHPC example

Table 7: Summary of SAKDC parameters

Table 8: The corpus of data files used to test SAKDC Table 9: Experiment 1: Benchmark parameters for DHPC Table 10: Experiment 1: Benchmark parameters for PPMA Table 11: Experiment 1: Benchmark parameters for PPMC'

Table 12: Experiment 1: Benchmarks for DHPC, PPMA and PPMC' Table 13: Experiment 2: Estimation base parameters

Table 14: Experiment 2: Symbol and instance densities Table 15: Experiment 3: Credibility base parameters Table 16: Experiment 4: Deep updating base parameters Table 17: Experiment 4: Deep updating results

Table 18: Experiment 5: Depth base parameters

Table 19: Experiment 5: Rate of node growth for different depths Table 20: Experiment 5: Memory run out times for different depths Table 21: Experiment 6: Memory base parameters . . . Table 22: Experiment 7: Windowed adaptivity base parameters Table 23: Experiment 8: Decay base parameters . . . . Table 24: Experiment 9: Threshold growth base parameters Table 25: Experiment 10: Probabilistic growth base parameters Table 26: Experiment 11: Node movement base parameters Table 27: Experiment 12: Growth regimes base parameters . . Table 28: Experiment 12: Four growth regimes . . . . Table 29: Experiment 13: Inheriting instances base parameters Table 30: Experiment 13: Addback results . . . . Table 31: Experiment 14: Heap management base parameters Table 32: Experiment 14: Heap management results . . . . Table 33: Experiment 15: Tree reconstruction base parameters Table 34: Experiment 16: Shortcut base parameters

Table 35: Experiment 16: Shortcuts results . Table 36: Experiment 16: Prediction statistics

. 4

23

91

100

101

113

147

184

186

187

189

194

196

199

200

201

202

203

204

207

210

213

218

221

225

230

231

232

233

236 237 238

(15)

XVI

Adaptive Data Compression

Table 37: Experiment 17: SAKDC(Optl) parameters . . . 239

Table 38: Experiment 17: Benchmark parameters for PPMC' . . . . 240

Table 39: Experiment 17: Benchmarks for SAKDC(Optl) and PPMC' 241

Table 40: Multimodal parameters used in the multimodal experiments 258

Table 41: SAKDC parameters used for the local model . . . . 259

Table 42: SAKDC parameters used for the ordinary models 259

Table 43: Performance of MMDC on the artificial source (multi) 264

Table 44: Mode detection performance for multi . . . 264

Table 45: Mode detection performance for concat . . . . 267

Table 46: Performance of MMDC on concatenated files (concat) 268

Table 47: Mode detection performance for mter . . . . 269

Table 48: Performance of MMDC on real interleaved data (inter) 271

Table 49: Performance of MMDC with increased memory 272

Table 50: The various forms of the word "adapt" 317

Table 51: Forms of the word "adapt" arranged in grammatical slots 318

(16)

Foreword

Following an exchange of correspondence, I met Ross in Adelaide in June 1988. I was approached by the University of Adelaide about being an external examiner for this dissertation and willingly agreed. Upon receiving a copy of this work, what struck me most was the scholarship with which Ross approaches and advances this relatively new field of adaptive data compression. This scholarship, coupled with the ability to express himself clearly using figures, tables, and incisive prose, demanded that Ross's dissertation be given a wider audience. And so this thesis was brought to the attention of Kluwer.

The modern data compression paradigm furthered by this work is based upon the separation of adaptive context modelling, adaptive statistics, and arithmetic coding. This work offers the most complete bibliography on this subject I am aware of. It provides an excellent and lucid review of the field, and should be equally as beneficial to newcomers as to those of us already in the field.

A paper "Modeling and Coding" by Rissanen and Langdon lays some theoretical groundwork to adaptivity, and Ross extends this work by further categorizing the notion of causal adaptation. In so doing, he provides insights into possible implementations. Ross's introduction also comments on terminology problems. For example, the term "compres- sion ratio" has no standard meaning between authors. Instead of arguing what compression ratio "really means", Ross proposes new terms that imply how the compression performance is calculated. For example, if value x is identified as the "proportion remaining", the reader immedi- ately knows that the number of bytes in the compressed file is x times the number of bytes in the uncompressed file.

The capability to adapt to the behaviour of a zero-order Markov

source opens many doors, since a variable-order Markov source is a col-

lection of independent zero-order sources. The arithmetic coding compo-

nent of the modern compression paradigm employs relative frequencies

(17)

xviii Adaptive Data Compression directly as opposed to older techniques that must first map the frequen- cies onto sets of integer-length code words possessing the prefix property.

Ross points out the historical interest of an algorithm called DAFC (Dou- ble Adaptive File Compression) that first adapts to Markov contexts and then adapts a probability distribution for each context. Ross's algorithm of Chapter 2, DHPC (Dynamic History Predictive Compression), does the kind of double adaptation that DAFC would have liked to have done.

DHPC, Ross's opening contribution and the basis for the three following chapters, is a valuable original contribution in its own right. A backwards tree of maximum depth m represents the more frequent variable-order contexts up to Markov order m. The algorithm description is generic, and can be used for experimentation. Designer-selected parameters de- termine if a node in the tree has enough activity to serve as a context, or enough activity to spawn deeper nodes.

The DHPC algorithm leads to the Swiss Army Knife Data Compres- sion (SAKDC) algorithm. SAKDC accommodates, through parameter selection, the DHPC algorithm as a special case. Ross defines differ- ent forms and mechanisms for adaptivity of variable-order models, and SAKDC provides an implementation environment for experimenting with them. The experiments performed provide a page of conclusions regard- ing what seems to work and what doesn't work in variable-order Markov models.

A very interesting aspect of the work is the identification and investigation of multimodal sources. A cylinder of a disk or a magnetic tape of data may contain Pascal programs, hex dumps, and text information. These three types of data each correspond to a source mode. For each source mode, the compression algorithm should employ a model appropriate for that mode. Multi-Modal algorithms switch between models as the source jumps from one mode to another. The MMDC (Multi-Modal Data Compression) algorithm is based upon the above approach, and an implementation satisfactorily switches between models. Ross observes that MMDC illustrates science building upon itself; MMDC unifies many concepts from the modern paradigm of data compression that were developed without multi-modal compression in mind.

Glen G. Langdon, Jr.

Computer Engineering Department

University of California, Santa Cruz

June 1990

(18)

Preface

This book is identical to my Ph.D. thesis (submitted to the University of Adelaide on 30 June 1989) except that it has been re-typeset and a few minor corrections have been made. These are listed in the presentation notes.

The thesis reviews the field of text data compression (Chapter 1) and then addresses the problem of compressing rapidly changing data streams (Chapter 2-7). To compress such streams, the compressor must continuously revise its model of the data (Chapter 2-4), and sometimes even discard its model in favour of a new one (Chapter 5).

Although the techniques involved in data compression are often complicated, the ideas are simple, and throughout the thesis I have endeavoured to keep them so. The reader should be able to tackle the thesis with no knowledge of data compression.

Ross N. Williams June 1990

Adelaide, Australia

(19)

Acknow ledgements

This monograph is a reproduction of a Ph.D. thesis that was submitted to the Department of Computer Science at the University of Adelaide on 30 June 1989 and I am grateful to all those who supported me during my candidature. In particular I am very grateful to my supervisor, Dr. William Beaumont for his guidance and support. I am also grateful to my temporary supervisor Dr. Tao Li for his help while Dr. Beaumont was on study leave in 1987.

The research that led to the thesis was supported in part by a Common- wealth of Australia Postgraduate Research Award.

Thanks is recorded to those whose informal communications were quoted

in the thesis, as all have given additional permission to be quoted in this

book.

(20)

Presentation Notes

References: All references are cited in the form [<jirstauthor> < year>].

Citations are set in bold face the first time they appear and ordinary face thereafter. All references cited in the text appear in the reference list and the index. Citations of references in the same year by the same author appear identical.

Special terms: New or important terminology has been set in bold face and appears in the index.

Tables and Figures: Unless otherwise stated, all tables and diagrams appearing in this thesis were created specifically for the thesis and are original to the thesis. All references to tables and figures in the text are set in bold face so that the text describing any particular table or figure can be found quickly.

Captions: Every table and figure in the thesis has a title which summarizes the diagram and appears in the table of contents. In addition, some tables and figures have a caption (a paragraph of descriptive text). Captions have been included mainly to aid the casual reader.

Intra-document references: References to entities within this thesis appear with the first letter in upper case (e.g. "Section 78.5"). References to entities in other documents appear with the first letter in lower case (e.g. "section 78.5").

Italics: Italics is used both for emphasis and for program and algorithm identifiers.

Notation: Section 1.2 describes much of the notation and terminology

of the thesis. Appendix E contains a summary of the mathematical

notation used in this thesis.

(21)

xxiv Adaptive Data Compression Typesetting: This book was typeset using the TEX[Knuth79][Kn- uth84] typesetting system and was reproduced photographically by Kluwer Academic Press.

Graphics: Most of the diagrams of this thesis were generated on a Macintosh

^l

computer using the program MacDraw.

²

Histograms were produced on a Macintosh using the program Excel.

³

The tree diagrams of Chapter 2 were plotted from PostScript

⁴

code generated directly by a DHPC program. The graphs in Chapter 4 and Chapter 5 were generated by a plotting program developed by the author. The "range frame" style of these graphs was inspired by the graphic design book [Tufte83] (esp.

p.132).

Corrections

This book is identical to my Ph.D. thesis except that it has been re-typeset and corrected. The corrections are:

Front matter: The front matter was reworked to suit the form of a book, and the foreword and preface were added.

Chapter 1: A note was appended to footnote 18 (Section 1.3.1) and the word "bits" in the same footnote was set in boldface and entered into the index. A copyright notice was added to Figure 10.

Chapter 4: The title of Table 39 was corrected.

Conclusions: A closing parenthesis was inserted in the fifth primary contribution paragraph.

Appendix A: Reference to Jones in the second paragraph was made less ambiguous. Typographical errors in the second and last equations of the proof were corrected.

Appendix C: "two pages" was changed to "few pages".

Appendix E: The definition of M was altered.

Detected remaining errors: The word "wooloomooloo" used through- out the thesis as an example message is a misspelling of the famous suburb of Sydney, Australia. The correct spelling is "Woolloomooloo".

Enthusiastic readers way wish, as an exercise, to re-work all the examples using the correct spelling.

1

Macintosh is a trademark of Apple Computer, Inc.

2

MacDraw is a trademark of Claris Corporation.

3 Excel is a trademark of Microsoft Corporation.

4 PostScript is a trademark of Adobe Systems Incorporated.

(22)

Abstract

The class of Markov data compression algorithms provides the best known compression for ordinary symbol data. Of these, the PPM (Prediction by Partial Matching) algorithm by Cleary and Witten (1984) is the most successful. This thesis reviews the class of Markov algorithms and introduces algorithms similar to PPM which are used as a vehicle for examining adaptivity in data compression algorithms in general.

Despite the amount of research into adaptive data compression algo- rithms, the term "adaptive" is not well defined. The term can be clar- ified by viewing the problem of adapting as that of tracking a source through a source space using the data the source is generating as a trail.

Adaptive algorithms achieve this by making assumptions about source trajectories. These assumptions can be used to classify data compres- sion algorithms into four groups with respect to adaptivity: not adaptive, initially adaptive, asymptotically adaptive and locally adaptive. These groups correspond roughly to classes of source trajectory. A new class of source called multimodal sources is introduced, members of which jump among a finite number of distinct points in the source space.

After common source trajectories have been identified, modifications to Markov algorithms are described that enable them to track each kind of source. Some of the modifications involve complex algorithms which are discussed in detail. Experimental results show that these modifications can substantially improve compression performance.

Of particular interest are sources with a multimodal trajectory. Such sources can be compressed well using a combination of locally adaptive and asymptotically adaptive models. The result is a multimodal algo- rithm which is independent of any particular sub-model. Experimental results support this approach.