The UB-Tree
4.2 UB-Tree Implementations
Several implementations of the UB-Tree have been made during the MISTRAL-project.
Their differences are presented in this section. As most measurements have only been conducted with one specific implementation, it is important to be aware of effects caused by it.
attribute name data type description
address bit-string upper bound of the Z-region
count integer number of tuples stored on the UB-page region varchar content of a Z-region
Table 4.1: UBAPI: DBMS-relation storing a UB-Tree
4.2.1 UBAPI
The UBAPI (UB-Tree application interface) is the first implementation of the UB-Tree.
It is a middle-ware between the application and a DBMS implemented as an ANSI C application-library. It re-implements the basic data types, i.e., attribute, tuple, relation, and page. A page of the UBAPI is represented by a tuple (Table 4.1) indexed by a B+ -Tree. Thus the DBMS is used only as a storage manager supporting B+-Tree-lookups. Own meta-data tables store the structural information about UB-Trees. The relation storing a UB-Tree is shown in Table 4.1 with a clustering B+-Tree index on the Z-address. The data-types of the attributes differ slightly depending on the types available by the used DBMS.
The UBAPI was easily created on top of an DBMS allowing a proof of concept, but it has several drawbacks:
standard techniques of the DBMS are not reused, i.e., data structures and functions for attribute, tuple, relations, page, etc. are re-implemented.
it has no standard interface and it is only partially complete, i.e., some functions are not fully implemented; one has to use special tools for creation, loading, and querying.
there is no query language and no optimizer.
the interprocess communication between client and DBMS is increased. As query processing is done within the UBAPI-library on the client side, not only the result tuples are transferred to the client but all candidates, i.e., whole pages.
Nils Frielinghaus and Volker Markl have created the initial version of the library on top of Transbase. Various students have ported it to other DBMSs and extended its functionality. The following RDBMS are supported: Transbase, DB2, Informix, Oracle, and Microsoft SQL-Server.
Its superiority for multidimensional problems in comparison to the standard indexes, i.e., secondary indexes, compound indexes, bitmap-indexes, provided by these systems has been shown in various technical reports and master theses [Fri97, Pie98, Bau98b, Mer99]
and publications [Bau98a, FMB99, MZB99, Mar99, MRB99].
4.2.2 Transbase ® Hypercube
In an EU-funded project the UB-Tree technology, i.e., the basic algorithms for Z-address calculation, split address calculation, and range query handling, have been integrated into the kernel of the RDBMS Transbase [TAS00, RMF+00]. Furthermore, extensions of the data definition language (DDL) and the optimizer have been implemented in order to enable the UB-Tree for generic query-processing.
The new version of Transbase featuring the UB-Tree is called “Transbase Hypercube”
and is available as a commercial product. It is already successfully running at several companies.
In comparison to the UBAPI, using a UB-Tree as index now requires only modifi-cations of the DDL statement creating the index, no other application code is affected.
Thus the UB-Tree can be used transparently like any other index provided by the DBMS.
Furthermore, the implementation requires, fewer LOCs (lines of code), i.e., only the core components making up a UB-Tree have have been implemented. Also the performance improves due to reduced interprocess communication, i.e., only result tuples are transfered to the application.
However, the code base of Transbase is rather complex as it is a commercial product addressing all aspects of a relational DBMS and it is not freely available for research due to commercial interests.
4.2.3 RFDBMS
Further research on new indexing methods, e.g., variants of the UB-Tree, and optimizations were not possible with the previous implementations due to the following reasons:
UBAPI: the UBAPI was not offering access to the implementation of methods of the underlying DBMS, i.e., the storage manager, optimizer, and the B+-Tree.
Transbase Hypercube: Transbase Code is protected due to commercial interests and is not available for research.
Before considering an implementation from scratch we inspected the following existing projects as an alternative starting point for the new UB-Tree implementation: Commer-cial DBMSs: Function-Based B-Trees, interfaces to extend indexes and free DBMSs and code: GiST, Shore, Predator, PostgreSQL, java.XXL. However, all of them have se-vere drawbacks and thus we decided to start from scratch and built a new implementation satisfying our requirements while remaining small and easy to maintain. In the following we discuss the drawbacks of the existing code bases and why they were no option for us.
Function-Based B-Trees are using a function F to compute the key for tuples.
Commercial implementations are: Oracle8i [Ora99], IBM DB2 [CCF+99], and MS SQL Server 2000 (computed columns [Mic00]). However, they do not allow for integrating of new split and query algorithms (e.g., UB-Tree range query). Therefore, implementing the UB-Tree with F = Z-value will not lead to the expected performance.
Interfaces to Extent Indexes: Some commercial DBMSs provide enhanced indexing interfaces allowing for arbitrary index structures in external modules: Extensible Indexing API [Ora99] and Datablade API [Inf99]. The user has to provide a set of functions that are used by the database server to access the index either stored inside the database (i.e., mapped to a relation in Oracle) or in external files. However, neglecting the cost of ownership for a commercial DBMS, only non-clustering indexes are supported inside the DBMS and internal kernel code cannot be reused for external files, leading to a significant coding effort for re-implementing required primitives.
The General Search Tree (GiST) approach: GiST [HNP95] provides a single framework for any tree-based index structure with the basic functionality for trees, e.g., insertion, deletion, splitting, search, etc. The individual semantics of the index are provided by the user with a class only implementing some basic functions. The UB-Tree fits into GiST, but an efficient implementation requires more user control for the search algorithm and page splitting as also suggested by [Aok98]. Still, the code does not provide a generic buffer manager and appropriate tools for managing, inspecting, and querying data.
Other Options: Shore [Tea00] offers the buffer management we were missing in GiST, but it consists of too much source code. For similar reasons we voted against an integration in PostgreSQL [Tea03a], Predator [Tea03b] or XXL [jT03] as these systems were providing functionality (e.g., locking, SQL interface, Client/Server architecture) irrelevant for our goal of comparing indexes.
Due to these reasons, a completely new implementation was founded by the author and a master student, Oliver Nickel. It was called RFDBMS (research f ocused DBMS) and it was presented as a software demo at EDBT’04 [Wid04].
It reassembles a simplified DBMS in the C++ programming language. Included is a storage manager, basic data structures (Attribute, Relation, Tuple) and various indexes (B+-Tree, R∗-Tree, UB-Tree, BUB-Tree, etc.). Thus it was possible to modify and optimize all aspects of query processing. Due to its component based approach it was also possible to easily extend existing indexes and add new indexing techniques. As the same components are used by all indexes, their comparison is fair as it only shows the difference between indexes and not independent implementations. Thus it eliminates the overhead of the UBAPI implementation, i.e., reduced page capacity due to additional management data, additional interprocess communication and re-implementation of concepts like page and tuple causing additional CPU overhead.
Table 4.2 shows the number of LOCs (Line of Code) and the inheritance graph for some of the implemented indexes. It shows that the UB-Tree inherits most code from the B+-Tree. In comparisons with the R∗-Tree this requires substantially fewer and easier code and in fact the sum of the number of LOCs for the B+-Tree and UB-Tree are approximately the same as for the R∗-Tree.
4.2.4 Summary
All three implementations have their advantages and drawbacks. The UBAPI has been a proof of concept, while the integration of the UB-Tree in Transbase has enabled commercial
Index Heap
UB−Tree
BUB−Tree R*−Tree
1837 240 1163 917 763 209 LOCs
B −Tree+
Table 4.2: Comparison of Index Implementations in RFDBMS
System LOCs Pros & Cons
UBAPI ≈ 85000 + proof of UB-Tree concept for all major RDBMSs
− re-implementation of DBMS primitives
− no standard interface and query language
− incomplete
− implementation overhead: interprocess communication, tuple processing
− no access to DBMS specific code, e.g., DBMS primitives, storage manager, B+-Tree
Transbase HyperCube
(only UB-Tree
specific code)
≈ 10000 + full featured and stable commercial product with support + “no modifications” of application code except DDL
statements
+ implements only UB-Tree relevant code, all other primi-tives and techniques are reused
+ optimal performance due to tuple and predicate processing within in the kernel
− source is protected by commercial laws and rather complex RFDBMS ≈ 20000 + full access to source code including DBMS primitives and
storage manager as well as index implementations + adding new indexes is easy
+ limited to the essential code necessary for a meaningful index comparison
− only for research purposes, i.e., it is only partially complete and not suitable for commercial applications
Table 4.3: Summary of UB-Tree Implementations
applications. RFDBMS has enabled further research on UB-Trees. Table 4.3 shows a summary of the implementations, listing their average number of LOCs, and the advantages and disadvantages of each.