CODESIGN FOR
REAL-TIME VIDEO APPLICATIONS
CODESIGN FOR
REAL-TIME VIDEO APPLICATIONS
by
JORG WILBERG
GMD-SET
SPRINGER-SCIENCE+BUSINESS MEDIA, B.v.
A C.I.P. Catalogue record for this book Is avallable from the Library of Congress
ISBN 978-1-4613-7786-3 ISBN 978-1-4615-6081-4 (eBook) DOI 10.1007/978-1-4615-6081-4
Printed an acid-free paper
AII Rights Reserved
© 1997 by Springer Science+Business Media Oordrecht Originally published by Kluwer Academic Publishers in 1997 Softcover reprint of the hardcover 1 st edition 1997
No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means. electronic or mechanical.
including photocopying. recording or by any information storage and retrieval system. without written permission from the copyright owner.
Printed an acid-free paper
AII Rights Reserved
© 1997 by Springer Science+Business Media Oordrecht Originally published by Kluwer Academic Publishers in 1997 Softcover reprint of the hardcover 1 st edition 1997
No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means. electronic or mechanical.
including photocopying. recording or by any information storage and retrieval system. without written permission from the copyright owner.
Contents
List of Figures vii
list of Tables xi
Acknowledgments xiii
l. INTRODUCTION 1
Motivation and Related Work 4
2. DESIGN PROJECT: VIDEO COMPRESSION SYSTEM 11
Rea 1-Time Condition 12
Video Compression Algorithms 16
Embedded Video Compression System 22
VlIW Processor Architecture 27
3. DESIGN METHODOLOGY 33
Challenges of System-level Codesign 33
Designer Centered Approach 36
Design Flow 37
4. QUANTITATIVE ANALYSIS 43
Requirement Analysis 48
Design Exploration 62
Data Transfer Analysis 84
5. DESIGN TOOLS 93
Analysis Tools 95
Cosynthesis Tools 98
6. HTMl-BASED CODESIGN FRAMEWORK 103
Organizing by Hypertext 107
v
vi CODESIGN FOR REAL-TIME VIDEO APPLICATIONS
7. RESULTS 117
Quantitative Analysis of JPEG, H.261, and MPEG 118
Cosynthesis Example: IDCT 151
Lessons Learned and Future Work 162
8. CONCLUSIONS 167
References 171
Index 189
List of Figures
1.1 Design domains that are combined in embedded system de- sign. The designer can control hardware, software, and algo-
rithms. 3
2.1 Basic transmission scenario for compressed data (ECC: Error
Correcting Code). 13
2.2 The principal structure of an H.261 or MPEG codec, where DCT (Discrete Cosine Transform), Q (Quantization), EC (En- tropy Coding), Q-l (inverse Quantization), mCT {Inverse
DCT)ED (Entropy Decoding). 18
2.3 Partitioning of tasks for the embedded video compression sys-
tem and the host system. 23
2.4 Structure of the embedded video compression system. 24 2.5 Different types of instruction level parallelism. IF (Instruction
Fetch), m (Instruction Decode), EX (EXecute), WB (Write
Back). 27
2.6 Structure of a VLIW processor and typical instruction word. 29 3.1 A designer centered approach. The tools are used to analyze
possible system variants. Next, the designer decides on the appropriate system structure which the tools translate into
the final system specification. 36
3.2 Design flow for the video compression system. 37 4.1 Bounding costs and achievable performance by a stepwise re-
duction of the design space. 44
vii
viii CODESIGN FOR REAL-TIME VIDEO APPLICATIONS
4.2 Realization of the mCT by simple matrix multiplication.
OCTSIZE is 8, only the first matrix multiplication of the two- dimensional transform is depicted. Inserting a temporary vari- able temp in the dctDis function leads to a reduction of the
store operations by a factor of eight. 47
4.3 Cache analysis data for the mpeg_play program and the ten-
nis. mpg data sequence. 60
4.4 Results of a disjoint exploration for a simple application. The application contains 3 additions which depend on a multipli- cation. Each operation takes one cycle to execute. 72
4.5 The exploration example of the PVRG-jpeg with just 12 dif-
ferent processors. 74
4.6 Unit views on the design space of the jpeg example. 76 4.7 Adder view when restricting the design space to 2 and 3 load
units. 78
4.8 Derivation of the summary view from the different unit views.
The complete unit view is projected into a 6-tuple of the sum- mary view. The mpeg_play program and the moglie.mpg data
sequence were explored. 80
4.9 A typical example of a register file exploration. The mpeg_play
program and the moglie. mpg sequence were explored. 83 4.10 Imaginary processor for the ideal forwarding situation. 88
5.1 Principal structure of the analysis tools. 96
5.2 Structure of the cosynthesis environment. 100 6.1 Principal HTML framework scenario. The design exploration
is used to exemplify the idea. 110
6.2 The client/server structure of the World-Wide Web. The client can start helper applications to display multimedia mes-
sages. 112
7.1 Distribution of macroblock types in different MPEG data se- quences. RedsNightmare.mpg and anim.mpg use the highest
number of B macroblocks. 122
7.2 Variation of the total operation count for different MPEG se- quences when decoded by the mpeg_play program. 123 7.3 Instruction mix of the different video compression programs
in Table 7.1. 125
7.4 The results of the disjoint exploration for the mpeg_play pro- gram. The average over 13 different image sequences is taken. 130
LIST OF FIGURES ix 7.5 The performance variation when using different image sequences
a.'l data input. The results of the mpeg_play program and the adder unit are shown. The minimum, average, and maximum cycle count is shown for an ensemble of 13 image sequences. 131 7.6 The disjoint exploration results of the j_rev_dct function. 133
7.7 Disjoint exploration results of the jpeg program. Only the exploration of the DecodeHuJJman function is shown. 134
7.8 Disjoint exploration results of the jpeg program. The cycle counts for the ChenIDct function are shown. 134
7.9 Selection of the processor structure in the combined explo- ration. The data for the mpeg_play program and the flower. mpg
data sequence are shown. 136
7.10 Register exploration of the complete mpeg_play program. The average over 13 different MPEG sequences is depicted. 140
7.11 Register exploration of the j_rev_dct function only. 140
7.12 Accumulated data transfers from the different operation types (PVRG-p64 program when processing the stennis.p64 sequence). 144
7.13 Accumulated data transfers to the different operation types (PVRG-p64 program when processing the stennis.p64 sequence). 145 7.14 The data transfers from the addition to the different operation
types (PVRG-p64 program when processing the stennis.p64
sequence). 146
7.15 The data transfers to the comparator from the different op- eration types (PVRG-p64 program when processing the sten-
nis.p64 sequence). 146
7.16 The bus exploration data ofthe mpeg_play program. The data for 13 different image sequences are shown. 149
7.17 The bus exploration data when using a small register file (1 write
port, 2 read ports, 32 words). 149
7.18 The gate count of different processors from Table 7.12. 157
7.19 Performance of the different meT functions (Table 7.9) on the five different processors (Table 7.12). The processing time is normalized w.r.t. the slowest implementation. 161
List of Tables
2.1 Commonly used image formats and number of macroblocks (16 x 16 pixel luminance and 8 x 8 pixel for the chrominances). 14 2.2 DCT based standard compression algorithms. 20 2.3 Different public domain video compression programs
(available from ftp: j jhavefun .stanford .edu jpub). 21 2.4 Alternative realizations of the compression tasks. 22 4.1 Most important functions and instruction mix of the mpeg_play
program when simulating with the moglie. mpg sequence as
data input. 49
4.2 Throughput reductions for the example processor design space. 69 7.1 Most demanding functions in the video compression programs
of Table 2.3. 120
7.2 Most demanding functions in the mpeg_play program. In- tra coded macroblocks(MBs) are denoted by I-MB, predictive coded (either forward or backward) MBs by P-MB, and bidi-
rectional coded MBs by B-:MB. 121
7.3 Most important functions of the JPEG decoder when differ- ent compression ratios are used. lena25.jpg uses the highest
compression and lena.jpg the lowest. 124
7.4 Instruction mix of the JPEG decoder when data sequences with different compression ratios are decoded. 126 7.5 The branch statistics of the different video compression pro-
grams when decoding the data sequences of Table 7.1. 127 7.6 The data transfer statistics of the PVRG-p64 program when
processing the stennis.p64 data sequence. 142 xi
xii CODESIGN FOR REAL-TIME VIDEO APPLICATIONS
7.7 The percentage of data transfer from different operation types to an addition. The results for different data sequences and
different programs are shown. 147
7.8 The data transfers for the Loop Filter function. (PVRG-p64 program when processing the stennis.p64 sequence.) 148 7.9 Instruction mix data for different versions of the IDCT. The
flower.mpg sequence was used as data input. 154 7.10 Synthesis results of different library components. 155 7.11 Synthesis results of arithmetic library components. 155 7.12 Synthesis results for different VLIW processors. The synthesis
was performed with the Synopsys Design Compiler using the LSIlOk library. Abbreviations: RF: Register File, w: words,
in: input ports, out: output ports. 157
7.13 Processing times per macroblock for executing the different IDCT functions on the five different processors. The flower. mpg
sequence was used as data input. 160
Acknowledgments
The book is a summary of a PhD thesis from the Brnndenburgische Technical University of Cottbus, GERMANY.
The thesis was prepared in the Sydis / CASTLE project at GMD-SET.
The work was sponsored by the BMBF (Bundesministerium fUr Bildung, Wis- senschaft, Forschung und Technologie) under research contract OlM2897 A Sydis.
I would like to thank my supervisors (in alphabetical order): Prof. Dr. R.
Camposano, Prof. Dr. W. Rosenstiel, and Prof. Dr. H.-T. Vierhaus for all the support and many helpful discussions.
This work would have been impossible without the support and friendship of the Sydis / CASTLE team (in alphabetical order): S. Gehlen, Dr. A. Gnedina, Dr. H. Guenther, H.-U. Kobialka, A. Kuth, Dr. M. Langevin, G. Pfeiffer, P.
PlOger, J. Schaaf, U. Steinhausen, Dr. P. Stravers, M. Thei6inger, Dr. H. Veit, U. Westerholz.
Thanks to all the people at G MD-SET for providing such a splendid working environment.
I would like to acknowledge the support from the MOVE group at the Delft Technical University, in particular, Dr. H. Corporaaland Dr. J. Hoogerbrugge.
GNU and other public domain software was used extensively for the program development and the preparation of the thesis. Thanks to all the people who contributed to these good programs. The text of the thesis was prepared with the GNU EMACS editor and the auctex package. The typesetting was done with the JnEX system. The figures were prepared with gnuplot and xfig·
Last but not least, I would like to thank my friends and family for all the encouragement and sympathy that helped me throughout the preparation of the thesis.
xiii