• No results found

6.4 Suffix tree applications

6.4.5 Others

Suffix tree can also be used for data compression, such as Burrows-Wheeler transform, LZW compression (LZSS) etc. [3]

6.5

Notes and short summary

Suffix Tree was first introduced by Weiner in 1973 [2]. In 1976, McCreight greatly simplified the construction algorithm. McCreight constructs the suffix tree from right to left. In 1995, Ukkonen gave the first on-line construction algorithms from left to right. All the three algorithms are linear time (O(n)). And some research shows the relationship among these 3 algorithms. [7]

Bibliography

[1] Esko Ukkonen. “On-line construction of suffix trees”. Al- gorithmica 14 (3): 249–260. doi:10.1007/BF01206331. http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf

[2] Weiner, P. “Linear pattern matching algorithms”, 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1-11, doi:10.1109/SWAT.1973.13

[3] Suffix Tree, Wikipedia. http://en.wikipedia.org/wiki/Suffix tree

[4] Esko Ukkonen. “Suffix tree and suffix array techniques for pattern analysis in strings”. http://www.cs.helsinki.fi/u/ukkonen/Erice2005.ppt

[5] Trie, Wikipedia. http://en.wikipedia.org/wiki/Trie

[6] Suffix Tree (Java). http://en.literateprograms.org/Suffix tree (Java) [7] Robert Giegerich and Stefan Kurtz. “From Ukkonen to McCreight

and Weiner: A Unifying View of Linear-Time Suffix Tree Con- struction”. Science of Computer Programming 25(2-3):187-218, 1995. http://citeseer.ist.psu.edu/giegerich95comparison.html

[8] Robert Giegerich and Stefan Kurtz. “A Comparison of Imper- ative and Purely Functional Suffix Tree Constructions”. Algo- rithmica 19 (3): 331–353. doi:10.1007/PL00009177. www.zbh.uni- hamburg.de/pubs/pdf/GieKur1997.pdf

[9] Bryan O’Sullivan. “suffixtree: Efficient, lazy suffix tree implementation”. http://hackage.haskell.org/package/suffixtree

[10] Danny. http://hkn.eecs.berkeley.edu/ dyoo/plt/suffixtree/

[11] Zhang Shaojie. “Lecture of Suffix Trees”. http://www.cs.ucf.edu/ shzhang/Combio09/lec3.pdf

[12] Lloyd Allison. “Suffix Trees”. http://www.allisons.org/ll/AlgDS/Tree/Suffix/ [13] Esko Ukkonen. “Suffix tree and suffix array techniques for pattern analysis

in strings”. http://www.cs.helsinki.fi/u/ukkonen/Erice2005.ppt

[14] Esko Ukkonen “Approximate string-matching over suffix trees”. Proc. CPM 93. Lecture Notes in Computer Science 684, pp. 228-242, Springer 1993. http://www.cs.helsinki.fi/u/ukkonen/cpm931.ps

Chapter 7

B-Trees

7.1

Introduction

B-Tree is important data structure. It is widely used in modern file systems. Some are implemented based on B+ tree, which is extended from B-tree. B-tree is also widely used in database systems.

Some textbooks introduce B-tree with the the problem of how to access a large block of data on magnetic disks or secondary storage devices[2]. It is also helpful to understand B-tree as a generalization of balanced binary search tree[2].

Refer to the Figure 7.1, It is easy to find the difference and similarity of B-tree regarding to binary search tree.

M

C G P T W

A B D E F H I J K N O Q R S U V X Y Z

Figure 7.1: Example B-Tree

Remind the definition of binary search tree. A binary search tree is either an empty node;

or a node contains 3 parts, a value, a left child and a right child. Both children are also binary search trees.

The binary search tree satisfies the constraint that.

all the values on the left child are not greater than the value of of this node;

the value of this node is not greater than any values on the right child. 163

For non-empty binary tree (L, k, R), whereL,Randkare the left, right chil- dren, and the key. FunctionKey(T) accesses the key of treeT. The constraint can be represented as the following.

∀x∈L,∀y∈R⇒Key(x)≤k≤Key(y) (7.1) If we extend this definition to allow multiple keys and children, we get the B-tree definition.

A B-tree

is either empty;

or contains n keys, and n+ 1 children, each child is also a B-Tree, we denote these keys and children ask1, k2, ..., kn andc1, c2, ..., cn, cn+1.

Figure 7.2illustrates a B-Tree node.

C[1] K[1] C[2] K[2] ... C[n] K[n] C[n+1]

Figure 7.2: A B-Tree node

The keys and children in a node satisfy the following order constraints. Keys are stored in non-decreasing order. thatk1≤k2≤...≤kn;

for each ki, all elements stored in child ci are not greater than ki, while ki is not greater than any values stored in child ci+1.

The constraints can be represented as in equation (7.2) as well.

∀xi∈ci, i= 0,1, ..., n,⇒x1≤k1≤x2≤k2≤...≤xn ≤kn≤xn+1 (7.2)

Finally, after adding some constraints to make the tree balanced, we get the complete B-tree definition.

All leaves have the same depth;

We define integral number,t, as theminimum degreeof B-tree; each node can have at most 2t1 keys;

each node can have at leastt−1 keys, except the root;

Consider a B-tree holdsn keys. The minimum degreet≥2. The height is h. All the nodes have at leastt−1 keys except the root. The root contains at least 1 key. There are at least 2 nodes at depth 1, at least 2t nodes at depth 2, at least 2t2 nodes at depth 3, ..., finally, there are at least 2th−1 nodes at depth

h. Times all nodes witht−1 except for root, the total number of keys satisfies the following inequality.

n 1 + (t1)(2 + 2t+ 2t2+...+ 2th−1) = 1 + 2(t1) h1 k=0 tk = 1 + 2(t1)t h1 t−1 = 2th−1 (7.3)

7.2. INSERTION 165

Thus we have the inequality between the height and the number of keys. h≤logtn+ 1

2 (7.4)

This is the reason why B-tree is balanced. The simplest B-tree is so called 2-3-4 tree, wheret= 2, that every node except root contains 2 or 3 or 4 keys. red-black tree can be mapped to 2-3-4 tree essentially.

The following Python code shows example B-tree definition. It explicitly passt when create a node.

class BTree:

def __init__(self, t):

self.t =t

self.keys=[]

self.children=[]

B-tree nodes commonly have satellite data as well. We ignore satellite data for illustration purpose.

In this chapter, we will firstly introduce how to generate B-tree by insertion. Two different methods will be explained. One is the classic method as in [2], that we split the node before insertion if it’s full; the other is the modify-fix approach which is quite similar to the red-black tree solution [3] [2]. We will next explain how to delete key from B-tree and how to look up a key.

7.2

Insertion

B-tree can be created by inserting keys repeatedly. The basic idea is similar to the binary search tree. When insert keyx, from the tree root, we examine all the keys in the node to find a position where all the keys on the left are less thanx, while all the keys on the right are greater thanx.1 If the current node is a leaf node, and it is not full (there are less then 2t1 keys in this node),x will be insert at this position. Otherwise, the position points to a child node. We need recursively insertxto it.

Figure 7.3 shows one example. The B-tree illustrated is 2-3-4 tree. When insert keyx = 22, because it’s greater than the root, the right child contains key 26, 38, 45 is examined next; Since 22<26, the first child contains key 21 and 25 are examined. This is a leaf node, and it is not full, key 22 is inserted to this node.

However, if there are 2t1 keys in the leaf, the new keyxcan’t be inserted, because this node is ’full’. When try to insert key 18 to the above example B-tree will meet this problem. There are 2 methods to solve it.

7.2.1

Splitting

Related documents