• No results found

CODESIGN FOR REAL-TIME VIDEO APPLICATIONS

N/A
N/A
Protected

Academic year: 2021

Share "CODESIGN FOR REAL-TIME VIDEO APPLICATIONS"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

CODESIGN FOR

REAL-TIME VIDEO APPLICATIONS

(2)

CODESIGN FOR

REAL-TIME VIDEO APPLICATIONS

by

JORG WILBERG

GMD-SET

SPRINGER-SCIENCE+BUSINESS MEDIA, B.v.

(3)

A C.I.P. Catalogue record for this book Is avallable from the Library of Congress

ISBN 978-1-4613-7786-3 ISBN 978-1-4615-6081-4 (eBook) DOI 10.1007/978-1-4615-6081-4

Printed an acid-free paper

AII Rights Reserved

© 1997 by Springer Science+Business Media Oordrecht Originally published by Kluwer Academic Publishers in 1997 Softcover reprint of the hardcover 1 st edition 1997

No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means. electronic or mechanical.

including photocopying. recording or by any information storage and retrieval system. without written permission from the copyright owner.

Printed an acid-free paper

AII Rights Reserved

© 1997 by Springer Science+Business Media Oordrecht Originally published by Kluwer Academic Publishers in 1997 Softcover reprint of the hardcover 1 st edition 1997

No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means. electronic or mechanical.

including photocopying. recording or by any information storage and retrieval system. without written permission from the copyright owner.

(4)

Contents

List of Figures vii

list of Tables xi

Acknowledgments xiii

l. INTRODUCTION 1

Motivation and Related Work 4

2. DESIGN PROJECT: VIDEO COMPRESSION SYSTEM 11

Rea 1-Time Condition 12

Video Compression Algorithms 16

Embedded Video Compression System 22

VlIW Processor Architecture 27

3. DESIGN METHODOLOGY 33

Challenges of System-level Codesign 33

Designer Centered Approach 36

Design Flow 37

4. QUANTITATIVE ANALYSIS 43

Requirement Analysis 48

Design Exploration 62

Data Transfer Analysis 84

5. DESIGN TOOLS 93

Analysis Tools 95

Cosynthesis Tools 98

6. HTMl-BASED CODESIGN FRAMEWORK 103

Organizing by Hypertext 107

v

(5)

vi CODESIGN FOR REAL-TIME VIDEO APPLICATIONS

7. RESULTS 117

Quantitative Analysis of JPEG, H.261, and MPEG 118

Cosynthesis Example: IDCT 151

Lessons Learned and Future Work 162

8. CONCLUSIONS 167

References 171

Index 189

(6)

List of Figures

1.1 Design domains that are combined in embedded system de- sign. The designer can control hardware, software, and algo-

rithms. 3

2.1 Basic transmission scenario for compressed data (ECC: Error

Correcting Code). 13

2.2 The principal structure of an H.261 or MPEG codec, where DCT (Discrete Cosine Transform), Q (Quantization), EC (En- tropy Coding), Q-l (inverse Quantization), mCT {Inverse

DCT)ED (Entropy Decoding). 18

2.3 Partitioning of tasks for the embedded video compression sys-

tem and the host system. 23

2.4 Structure of the embedded video compression system. 24 2.5 Different types of instruction level parallelism. IF (Instruction

Fetch), m (Instruction Decode), EX (EXecute), WB (Write

Back). 27

2.6 Structure of a VLIW processor and typical instruction word. 29 3.1 A designer centered approach. The tools are used to analyze

possible system variants. Next, the designer decides on the appropriate system structure which the tools translate into

the final system specification. 36

3.2 Design flow for the video compression system. 37 4.1 Bounding costs and achievable performance by a stepwise re-

duction of the design space. 44

vii

(7)

viii CODESIGN FOR REAL-TIME VIDEO APPLICATIONS

4.2 Realization of the mCT by simple matrix multiplication.

OCTSIZE is 8, only the first matrix multiplication of the two- dimensional transform is depicted. Inserting a temporary vari- able temp in the dctDis function leads to a reduction of the

store operations by a factor of eight. 47

4.3 Cache analysis data for the mpeg_play program and the ten-

nis. mpg data sequence. 60

4.4 Results of a disjoint exploration for a simple application. The application contains 3 additions which depend on a multipli- cation. Each operation takes one cycle to execute. 72

4.5 The exploration example of the PVRG-jpeg with just 12 dif-

ferent processors. 74

4.6 Unit views on the design space of the jpeg example. 76 4.7 Adder view when restricting the design space to 2 and 3 load

units. 78

4.8 Derivation of the summary view from the different unit views.

The complete unit view is projected into a 6-tuple of the sum- mary view. The mpeg_play program and the moglie.mpg data

sequence were explored. 80

4.9 A typical example of a register file exploration. The mpeg_play

program and the moglie. mpg sequence were explored. 83 4.10 Imaginary processor for the ideal forwarding situation. 88

5.1 Principal structure of the analysis tools. 96

5.2 Structure of the cosynthesis environment. 100 6.1 Principal HTML framework scenario. The design exploration

is used to exemplify the idea. 110

6.2 The client/server structure of the World-Wide Web. The client can start helper applications to display multimedia mes-

sages. 112

7.1 Distribution of macroblock types in different MPEG data se- quences. RedsNightmare.mpg and anim.mpg use the highest

number of B macroblocks. 122

7.2 Variation of the total operation count for different MPEG se- quences when decoded by the mpeg_play program. 123 7.3 Instruction mix of the different video compression programs

in Table 7.1. 125

7.4 The results of the disjoint exploration for the mpeg_play pro- gram. The average over 13 different image sequences is taken. 130

(8)

LIST OF FIGURES ix 7.5 The performance variation when using different image sequences

a.'l data input. The results of the mpeg_play program and the adder unit are shown. The minimum, average, and maximum cycle count is shown for an ensemble of 13 image sequences. 131 7.6 The disjoint exploration results of the j_rev_dct function. 133

7.7 Disjoint exploration results of the jpeg program. Only the exploration of the DecodeHuJJman function is shown. 134

7.8 Disjoint exploration results of the jpeg program. The cycle counts for the ChenIDct function are shown. 134

7.9 Selection of the processor structure in the combined explo- ration. The data for the mpeg_play program and the flower. mpg

data sequence are shown. 136

7.10 Register exploration of the complete mpeg_play program. The average over 13 different MPEG sequences is depicted. 140

7.11 Register exploration of the j_rev_dct function only. 140

7.12 Accumulated data transfers from the different operation types (PVRG-p64 program when processing the stennis.p64 sequence). 144

7.13 Accumulated data transfers to the different operation types (PVRG-p64 program when processing the stennis.p64 sequence). 145 7.14 The data transfers from the addition to the different operation

types (PVRG-p64 program when processing the stennis.p64

sequence). 146

7.15 The data transfers to the comparator from the different op- eration types (PVRG-p64 program when processing the sten-

nis.p64 sequence). 146

7.16 The bus exploration data ofthe mpeg_play program. The data for 13 different image sequences are shown. 149

7.17 The bus exploration data when using a small register file (1 write

port, 2 read ports, 32 words). 149

7.18 The gate count of different processors from Table 7.12. 157

7.19 Performance of the different meT functions (Table 7.9) on the five different processors (Table 7.12). The processing time is normalized w.r.t. the slowest implementation. 161

(9)

List of Tables

2.1 Commonly used image formats and number of macroblocks (16 x 16 pixel luminance and 8 x 8 pixel for the chrominances). 14 2.2 DCT based standard compression algorithms. 20 2.3 Different public domain video compression programs

(available from ftp: j jhavefun .stanford .edu jpub). 21 2.4 Alternative realizations of the compression tasks. 22 4.1 Most important functions and instruction mix of the mpeg_play

program when simulating with the moglie. mpg sequence as

data input. 49

4.2 Throughput reductions for the example processor design space. 69 7.1 Most demanding functions in the video compression programs

of Table 2.3. 120

7.2 Most demanding functions in the mpeg_play program. In- tra coded macroblocks(MBs) are denoted by I-MB, predictive coded (either forward or backward) MBs by P-MB, and bidi-

rectional coded MBs by B-:MB. 121

7.3 Most important functions of the JPEG decoder when differ- ent compression ratios are used. lena25.jpg uses the highest

compression and lena.jpg the lowest. 124

7.4 Instruction mix of the JPEG decoder when data sequences with different compression ratios are decoded. 126 7.5 The branch statistics of the different video compression pro-

grams when decoding the data sequences of Table 7.1. 127 7.6 The data transfer statistics of the PVRG-p64 program when

processing the stennis.p64 data sequence. 142 xi

(10)

xii CODESIGN FOR REAL-TIME VIDEO APPLICATIONS

7.7 The percentage of data transfer from different operation types to an addition. The results for different data sequences and

different programs are shown. 147

7.8 The data transfers for the Loop Filter function. (PVRG-p64 program when processing the stennis.p64 sequence.) 148 7.9 Instruction mix data for different versions of the IDCT. The

flower.mpg sequence was used as data input. 154 7.10 Synthesis results of different library components. 155 7.11 Synthesis results of arithmetic library components. 155 7.12 Synthesis results for different VLIW processors. The synthesis

was performed with the Synopsys Design Compiler using the LSIlOk library. Abbreviations: RF: Register File, w: words,

in: input ports, out: output ports. 157

7.13 Processing times per macroblock for executing the different IDCT functions on the five different processors. The flower. mpg

sequence was used as data input. 160

(11)

Acknowledgments

The book is a summary of a PhD thesis from the Brnndenburgische Technical University of Cottbus, GERMANY.

The thesis was prepared in the Sydis / CASTLE project at GMD-SET.

The work was sponsored by the BMBF (Bundesministerium fUr Bildung, Wis- senschaft, Forschung und Technologie) under research contract OlM2897 A Sydis.

I would like to thank my supervisors (in alphabetical order): Prof. Dr. R.

Camposano, Prof. Dr. W. Rosenstiel, and Prof. Dr. H.-T. Vierhaus for all the support and many helpful discussions.

This work would have been impossible without the support and friendship of the Sydis / CASTLE team (in alphabetical order): S. Gehlen, Dr. A. Gnedina, Dr. H. Guenther, H.-U. Kobialka, A. Kuth, Dr. M. Langevin, G. Pfeiffer, P.

PlOger, J. Schaaf, U. Steinhausen, Dr. P. Stravers, M. Thei6inger, Dr. H. Veit, U. Westerholz.

Thanks to all the people at G MD-SET for providing such a splendid working environment.

I would like to acknowledge the support from the MOVE group at the Delft Technical University, in particular, Dr. H. Corporaaland Dr. J. Hoogerbrugge.

GNU and other public domain software was used extensively for the program development and the preparation of the thesis. Thanks to all the people who contributed to these good programs. The text of the thesis was prepared with the GNU EMACS editor and the auctex package. The typesetting was done with the JnEX system. The figures were prepared with gnuplot and xfig·

Last but not least, I would like to thank my friends and family for all the encouragement and sympathy that helped me throughout the preparation of the thesis.

xiii

References

Related documents

This report serves as both reference and users manuals for WRAP features providing capabilities for simulation of flood control reservoir system operations, simulation of pulse flow

Content and data published for all (general use) is available to all registered users of the Intranet that comprises of Peace Corps employees – users, managers, system

Pressing the Function Button five times displays the output current to the grid.... Press the Function Button six times displays the current

Public awareness campaigns of nonnative fish impacts should target high school educated, canal bank anglers while mercury advisories should be directed at canal bank anglers,

Table 5.3 shows statistics for the number of distinct primary OSDs of all files in the workload that are larger than 4 MiB, because the number of OSDs per file is always 1 for

  Introduction & Summary  Laurie, Sam  All  Background   Chapter Introduction  Danny  Sam  Background   1.1 Rail Transportation Overview  Danny 

Using information learned from rending about Ancient Greece, have students write about the following prompt:.. "Think about the jobs Ancient Greeks had and how much time