INDEX DATA STRUCTURES IN
OBJECT-ORIENTED DATABASES
The Kluwer International Series on ADVANCES IN DATABASE SYSTEMS
Series Editor
Ahmed K. Elmagarmid
Other books in the Series:
Purdue University West Lafayette. IN 47907
DATABASE CONCURRENCY CONTROL: Methods, Performance, and Analysis by Alexander Thomasian
ISBN: 0-7923-9741-X
TIME-CONSTRAINED TRANSACTION MANAGEMENT Real-Time Constraints in Database Transaction Systems by Nandit R. Soparkar. Henry F. Korth. Abraham Silberschatz
ISBN: 0-7923-9752-5
SEARCHING MULTIMEDIA DATABASES BY CONTENT by Christos Faloutsos
ISBN: 0-7923-9777-0
REPLICATION TECHNIQUES IN DISTRIBUTED SYSTEMS by Abdelsalam A. Helal. Abdelsalam A. Heddaya. Bharat B. Bhargava
ISBN: 0-7923-9800-9
VIDEO DATABASE SYSTEMS: Issues, Products, and Applications
by Ahmed K. Elmagarmid. Haitao Jiang. Abdelsalam A. Helal. Anupam Joshi. Magdy Ahmed ISBN: 0-7923-9872-6
DATABASE ISSUES IN GEOGRAPHIC INFORMATION SYSTEMS by Nabil R. Adam and Aryya Gangopadhyay
ISBN: 0-7923-9924-2
The K1uwer International Series on Advances in Database Systems addresses the following goals:
• To publish thorough and cohesive overviews of advanced topics in database systems.
• To publish works which are larger in scope than survey articles, and which will contain more detailed background infonnation.
• To provide a single point coverage of advanced and timely topics.
• To provide a forum for a topic of study by many researchers that may not yet have reached a stage of maturity to warrant a comprehensive textbook.
INDEX DATA STRUCTURES IN OBJECT-ORIENTED DATABASES
by
Thomas A. MUECK Martin L. POLASCHEK
Universitat Wien Vienna, Austria
~
.
. ,
SPRINGER SCIENCE+BUSINESS MEDIA, LLC
ISBN 978-1-4613-7849-5 ISBN 978-1-4615-6213-9 (eBook) DOI 10.1007/978-1-4615-6213-9
Library of Congress Cataloging-in-Publication Data A C.I.P. Catalogue record for this book is available from the Library of Congress.
Copyright © 1997 bY' Springer Science+Business Media New York Originally published by Kluwer Academic Publishers. New York in 1997 Softcover reprint of the hardcover 1st edition 1997
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photo- copying, recording, or otherwise, without the prior written permission of the publisher, Springer Science+Business Media, LLC.
Printed on acid-free paper.
CONTENTS
Preface VB
1 INTRODUCTION 1
1.1 Object-oriented databases and indexing 2
1.2 Application aspects 5
2 DATABASE MODEL 7
2.1 Object Model 9
2.2 Query language issues 23
2.3 Bibliography 28
3 DATA STRUCTURES AND INDEXING 29
3.1 Basics 29
3.2 A systematic approach 52
3.3 One-dimensional search data structures 61
3.4 Multi-dimensional Search Data Structures 71
3.5 Bibliography 83
4 TYPE HIERARCHY INDEXING 85
4.1 Problem description 85
4.2 Type grouping 89
4.3 Key grouping 96
4.4 Multikey type index 106
4.5 Bibliography 120
5 AGGREGATION PATH INDEXING 123
5.1 Problem description 123
5.2 Path decomposition schemes 129
5.3 Bibliography 139
VI INDEX DATA STRUCTURES IN OODB
6 COLLECTION OPERATIONS
6.1 Problem description
6.2 Signature files for indexing multi-valued properties 6.3 Bibliography
7 PERFORMANCE ANALYSIS - AN EXAMPLE
7.1 Storage space requirements 7.2 Query performance
REFERENCES INDEX
141 141 146 150
151 152 159 165 175
PREFACE
Object-oriented database management systems (OODBMS) are used to imple- ment and maintain large object databases on persistent storage. Regardless whether the underlying database model follows the object-oriented, the rela- tional or the object-relational paradigm, a key feature of any DBMS product is content based access to data sets. On the one hand this feature provides user-friendly query interfaces based on predicates to describe the desired data.
On the other hand it poses challenging questions regarding DBMS design and implementation as well as the application development process on top of the DBMS.
The reason for the latter is that the actual query performance depends on a technically meaningful use of access support mechanisms. In particular, if chosen and applied properly, such a mechanism speeds up the execution of predicate based queries. In the object-oriented world, such queries may involve arbitrarily complex terms referring to inheritance hierarchies and aggregation paths. These features are attractive at the application level, however, they increase the complexity of appropriate access support mechanisms which are known to be technically non-trivial in the relational world.
In the field of databases and database management systems, such an access support mechanism for improved query performance relies on one or more un- derlying search data structures and is usually called index. Informally, the central idea behind this kind of data structure application is to find the identi- fiers of all objects fulfilling a given query predicate without reading the objects from disk. The practical benefit of indexing large persistent object sets is there- fore a significant reduction in the number of disk I/O operations thus yielding a performance gain.
The purpose of this book is to provide technical information about current and future issues of search data structures used to index large object-oriented databases. The intended audience of this book includes all kinds of practitioners involved in OODBMS product selection, application dependent database per- formance tuning and application development on top of object databases as well as researchers and students interested in the technical issues of object-oriented
viii INDEX DATA STRUCTURES IN OODB
databases. The only prerequisite for understanding the material presented in this book is a working knowledge object-oriented modeling and programming concepts and a minimum knowledge of algebraic concepts like for example sets.
After the introduction two preparatory chapters present the underlying data- base model as outlined in the ODMG-93 [Cat96] proposal on the one hand and a chapter elaborating on the technical issues of search data structures and their use for indexing large data sets on the other hand. The three subsequent chap- ters deal with major indexing topics in object-oriented databases, in particular, type hierarchy indexing, aggregation path indexing, and speedup of collection operations. The presentation is concluded with a performance analysis example in the field of type hierarchy indexing.
A related issue not covered in this book is physical object clustering or, in other words, the mapping of object identifiers to physical storage addresses.
Decoupling support for content based access from physical object management and, therefore, the indexing component from an OODBMS's persistent object store provides a high degree of flexibility for both application programmers and system developers. Therefore the issues in the context of object clustering form a separate research domain beyond the scope of this book. Details about the indexing components of particular OODBMS products have been omitted from this book for two reasons. At first, it is hardly possible to get detailed technical information on the indexing components from vendors and secondly, even if this kind of information could be obtained, it is quickly dated. So it seems to be more appropriate to describe the technical issues and solutions in this field and help the reader in this way to decide about products in presence of timely and hopefully detailed information.
Acknowledgments
We thank our colleagues at the Abteilung fiir Data Engineering, Universitat Wien, for hours of fruitful discussions and in particular our former room mate Erich Schikuta for introducing us to the versatile field of search data structures in the early days.
Also, we would like to thank Professor Ahmed K. Elmagarmid for supporting this book project.
This book would not exist without the continuing encouragement by the peo- ple at Kluwer Academic Publishers, especially by Scott Delman and his staff.
Special thanks for being patient.