• No results found

An Introduction to Tries

N/A
N/A
Protected

Academic year: 2021

Share "An Introduction to Tries"

Copied!
115
0
0

Loading.... (view fulltext now)

Full text

(1)

21.09.2015

(2)

Introduction CS Background

Given: Words, e.g. in binary code

Ξ

1

= 11010 . . . , Ξ

2

= 00011 . . . , Ξ

3

= 01101 . . . , Ξ

4

= 00000 . . . , Ξ

5

= 11111 . . . , Ξ

6

= 11100 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

(3)

Introduction CS Background

Given: Words, e.g. in binary code

Ξ

1

= 11010 . . . , Ξ

2

= 00011 . . . , Ξ

3

= 01101 . . . , Ξ

4

= 00000 . . . , Ξ

5

= 11111 . . . , Ξ

6

= 11100 . . .

Task: Storage that allows fast search and insert/delete operations

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

(4)

Introduction CS Background

Given: Words, e.g. in binary code

Ξ

1

= 11010 . . . , Ξ

2

= 00011 . . . , Ξ

3

= 01101 . . . , Ξ

4

= 00000 . . . , Ξ

5

= 11111 . . . , Ξ

6

= 11100 . . .

Task: Storage that allows fast search and insert/delete operations

→ Use tree-like data structures such as a Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

(5)

Ξ

1

= 11010 . . . , Ξ

2

= 00011 . . . , Ξ

3

= 01101 . . . , Ξ

4

= 00000 . . . , Ξ

5

= 11111 . . . , Ξ

6

= 11100 . . .

Task: Storage that allows fast search and insert/delete operations

→ Use tree-like data structures such as a Trie (Information retrieval)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

(6)

Introduction CS Background

Constructing a Trie

Ξ

1

= 11010 . . . , Ξ

2

= 00011 . . . , Ξ

3

= 01101 . . . , Ξ

4

= 00000 . . . , Ξ

5

= 11111 . . . , Ξ

6

= 11100 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

(7)

Introduction CS Background

Constructing a Trie

Ξ

1

= 11010 . . . , Ξ

2

= 00011 . . . , Ξ

3

= 01101 . . . , Ξ

4

= 00000 . . . , Ξ

5

= 11111 . . . , Ξ

6

= 11100 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

(8)

Introduction CS Background

Constructing a Trie

Ξ

1

= 11010 . . . , Ξ

2

= 00011 . . . , Ξ

3

= 01101 . . . , Ξ

4

= 00000 . . . , Ξ

5

= 11111 . . . , Ξ

6

= 11100 . . .

Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

(9)

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

(10)

Introduction CS Background

Constructing a Trie

Ξ

1

= 11010 . . . , Ξ

2

= 00011 . . . , Ξ

3

= 01101 . . . , Ξ

4

= 00000 . . . , Ξ

5

= 11111 . . . , Ξ

6

= 11100 . . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

(11)

Introduction CS Background

Searching

Search for Ξ

1

= 11010 . . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(12)

Introduction CS Background

Searching

Search for Ξ

1

= 11010 . . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(13)

Introduction CS Background

Searching

Search for Ξ

1

=

11010

. . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(14)

Introduction CS Background

Searching

Search for Ξ

1

=

11010

. . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(15)

Introduction CS Background

Searching

Search for Ξ

1

=

11010

. . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(16)

Introduction CS Background

Searching

Search for Ξ

1

=

11010

. . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

= 3

= Shortest prefix of Ξ

1

not shared by Ξ

2

, . . . , Ξ

6

Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(17)

Introduction CS Background

Searching

Search for Ξ

1

=

11010

. . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

= 3

= Shortest prefix of Ξ

1

not shared by Ξ

2

, . . . , Ξ

6

Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(18)

Introduction CS Background

Searching

Search for Ξ

1

=

11010

. . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

= Shortest prefix of Ξ

1

not shared by Ξ

2

, . . . , Ξ

6

Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(19)

Introduction CS Background

Searching

Search for Ξ

1

=

11010

. . .

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

= 3

= Shortest prefix of Ξ

1

not shared by Ξ

2

, . . . , Ξ

6

Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(20)

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Searching cost = Depth of Ξ

1

= 3

= Shortest prefix of Ξ

1

not shared by Ξ

2

, . . . , Ξ

6

Worst case = Height of the Trie = 4

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

(21)

Introduction The Probabilistic Model

Input Model

Generate the words Ξ

1

, Ξ

2

, . . . to be stored → Probabilistic Model

The words Ξ

1

, Ξ

2

, . . . are independent and identically distributed

Each word Ξ

i

= ξ

1

ξ

2

ξ

3

ξ

4

. . . consists of letters ξ

1

, ξ

2

, . . . that are:

More general models allow ξ

1

, ξ

2

, . . . to be dependent (e.g. evolving as a Markov chain)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

(22)

Introduction The Probabilistic Model

Input Model

Generate the words Ξ

1

, Ξ

2

, . . . to be stored → Probabilistic Model The words Ξ

1

, Ξ

2

, . . . are independent and identically distributed

Each word Ξ

i

= ξ

1

ξ

2

ξ

3

ξ

4

. . . consists of letters ξ

1

, ξ

2

, . . . that are:

More general models allow ξ

1

, ξ

2

, . . . to be dependent (e.g. evolving as a Markov chain)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

(23)

Introduction The Probabilistic Model

Input Model

Generate the words Ξ

1

, Ξ

2

, . . . to be stored → Probabilistic Model The words Ξ

1

, Ξ

2

, . . . are independent and identically distributed Each word Ξ

i

= ξ

1

ξ

2

ξ

3

ξ

4

. . . consists of letters ξ

1

, ξ

2

, . . . that are:

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

(24)

Introduction The Probabilistic Model

Input Model

Generate the words Ξ

1

, Ξ

2

, . . . to be stored → Probabilistic Model The words Ξ

1

, Ξ

2

, . . . are independent and identically distributed Each word Ξ

i

= ξ

1

ξ

2

ξ

3

ξ

4

. . . consists of letters ξ

1

, ξ

2

, . . . that are:

• independent,

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

(25)

Introduction The Probabilistic Model

Input Model

Generate the words Ξ

1

, Ξ

2

, . . . to be stored → Probabilistic Model The words Ξ

1

, Ξ

2

, . . . are independent and identically distributed Each word Ξ

i

= ξ

1

ξ

2

ξ

3

ξ

4

. . . consists of letters ξ

1

, ξ

2

, . . . that are:

• independent, • P(ξ

j

= 0) = 1 /2 = P(ξ

j

= 1)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

(26)

Generate the words Ξ

1

, Ξ

2

, . . . to be stored → Probabilistic Model The words Ξ

1

, Ξ

2

, . . . are independent and identically distributed Each word Ξ

i

= ξ

1

ξ

2

ξ

3

ξ

4

. . . consists of letters ξ

1

, ξ

2

, . . . that are:

• independent, • P(ξ

j

= 0) = 1 /2 = P(ξ

j

= 1)

More general models allow ξ

1

, ξ

2

, . . . to be dependent (e.g. evolving as a Markov chain)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

(27)

Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(28)

Ξ1, Ξ2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(29)

Ξ2

Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(30)

Ξ1, Ξ2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(31)

Ξ2

Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(32)

Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(33)

Ξ3

Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(34)

Ξ2 Ξ1

Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(35)

Ξ4

Ξ2 Ξ1

Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(36)

Ξ4

Ξ2 Ξ1

Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(37)

Ξ2 Ξ1, Ξ4

Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(38)

Ξ2 Ξ4

Ξ1

Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(39)

Ξ2

Ξ1 Ξ4

Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(40)

Ξ5

Ξ2

Ξ1 Ξ4

Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(41)

Ξ2

Ξ1 Ξ4

Ξ3, Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(42)

Ξ2

Ξ1 Ξ4

Ξ5

Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(43)

Ξ2

Ξ1 Ξ4

Ξ3, Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(44)

Ξ2

Ξ1 Ξ4

Ξ5

Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(45)

Ξ2

Ξ1 Ξ4 Ξ3 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

(46)

Analysis The Depth

Consider n words Ξ

1

, . . . , Ξ

n

. What is the depth of the vertex Ξ

1

?

Recall:

Depth D

n

= Length of the shortest unique prefix of Ξ

1

= ξ

1

ξ

2

ξ

3

. . . P(D

n

≤ k) = P(Ξ

2

, . . . , Ξ

n

do not start with ξ

1

. . . ξ

k

)

Consequence:

P(D

n

≤ α log

2

(n)) = 1 − n

−α



n−1

n→∞

−→

( 1, if α > 1, 0, if α < 1.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

(47)

Analysis The Depth

Consider n words Ξ

1

, . . . , Ξ

n

. What is the depth of the vertex Ξ

1

? Recall:

Depth D

n

= Length of the shortest unique prefix of Ξ

1

= ξ

1

ξ

2

ξ

3

. . .

P(D

n

≤ k) = P(Ξ

2

, . . . , Ξ

n

do not start with ξ

1

. . . ξ

k

)

Consequence:

P(D

n

≤ α log

2

(n)) = 1 − n

−α



n−1

n→∞

−→

( 1, if α > 1, 0, if α < 1.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

(48)

Analysis The Depth

Consider n words Ξ

1

, . . . , Ξ

n

. What is the depth of the vertex Ξ

1

? Recall:

Depth D

n

= Length of the shortest unique prefix of Ξ

1

= ξ

1

ξ

2

ξ

3

. . . P(D

n

≤ k) = P(Ξ

2

, . . . , Ξ

n

do not start with ξ

1

. . . ξ

k

)

= 1 −  1 2



k

!

n−1

Consequence:

P(D

n

≤ α log

2

(n)) = 1 − n

−α



n−1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

(49)

Analysis The Depth

Consider n words Ξ

1

, . . . , Ξ

n

. What is the depth of the vertex Ξ

1

? Recall:

Depth D

n

= Length of the shortest unique prefix of Ξ

1

= ξ

1

ξ

2

ξ

3

. . . P(D

n

≤ k) = P(Ξ

2

, . . . , Ξ

n

do not start with ξ

1

. . . ξ

k

)

= 1 −  1 2



k

!

n−1

Consequence:

P(D

n

≤ α log

2

(n)) = 1 − n

−α



n−1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

(50)

Analysis The Depth

Consider n words Ξ

1

, . . . , Ξ

n

. What is the depth of the vertex Ξ

1

? Recall:

Depth D

n

= Length of the shortest unique prefix of Ξ

1

= ξ

1

ξ

2

ξ

3

. . . P(D

n

≤ k) = P(Ξ

2

, . . . , Ξ

n

do not start with ξ

1

. . . ξ

k

)

= 1 −  1 2



k

!

n−1

Consequence:

P(D

n

≤ α log

2

(n)) = 1 − n

−α



n−1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

(51)

Depth D

n

= Length of the shortest unique prefix of Ξ

1

= ξ

1

ξ

2

ξ

3

. . . P(D

n

≤ k) = P(Ξ

2

, . . . , Ξ

n

do not start with ξ

1

. . . ξ

k

)

= 1 −  1 2



k

!

n−1

Consequence:

P(D

n

≤ α log

2

(n)) = 1 − n

−α



n−1 n→∞

−→

( 1, if α > 1, 0, if α < 1.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

(52)

Analysis The Depth

Results on D

n

Shown on the previous slide:

D

n

log

2

(n)

−→ 1

P

(n → ∞)

Thm (Knuth ’72): E[D

n

] = log

2

(n) + Ψ(log

2

(n)) + o(1) with periodic function Ψ

Thm (Szpankowski ’86): Var(D

n

) ∼ Φ(log

2

(n)) with periodic function Φ

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

(53)

Analysis The Depth

Results on D

n

Shown on the previous slide:

D

n

log

2

(n)

−→ 1

P

(n → ∞) Considering the previous slide more carefully:

P (D

n

− log

2

(n) < x) ≈



1 − 2

−x

n



n−1

n→∞

−→ e

−2−x

(Limit is a Gumbel distribution known from extreme value theory)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

(54)

Analysis The Depth

Results on D

n

Shown on the previous slide:

D

n

log

2

(n)

−→ 1

P

(n → ∞) Considering the previous slide more carefully:

P (D

n

− log

2

(n) < x) ≈



1 − 2

−x

n



n−1

n→∞

−→ e

−2−x

(Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E[D

n

] = log

2

(n) + Ψ(log

2

(n)) + o(1)

with periodic function Ψ

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

(55)

D

n

log

2

(n)

−→ 1

P

(n → ∞) Considering the previous slide more carefully:

P (D

n

− log

2

(n) < x) ≈



1 − 2

−x

n



n−1

n→∞

−→ e

−2−x

(Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E[D

n

] = log

2

(n) + Ψ(log

2

(n)) + o(1)

with periodic function Ψ

Thm (Szpankowski ’86): Var(D

n

) ∼ Φ(log

2

(n)) with periodic function Φ

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

(56)

Analysis The Height

Consider n words Ξ

1

, . . . , Ξ

n

. What is the height of the resulting Trie?

Def: Height H

n

= max{D

n

i

) : i = 1, . . . , n}. The result P(D

n

≤ k) = (1 − 2

−k

)

n−1

implies:

P (H

n

> α log

2

(n)) =

Consequence: P (H

n

> α log

2

(n)) → 0 for α > 2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

(57)

Analysis The Height

Consider n words Ξ

1

, . . . , Ξ

n

. What is the height of the resulting Trie?

Def: Height H

n

= max{D

n

i

) : i = 1, . . . , n}.

The result P(D

n

≤ k) = (1 − 2

−k

)

n−1

implies: P (H

n

> α log

2

(n)) =

Consequence: P (H

n

> α log

2

(n)) → 0 for α > 2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

(58)

Analysis The Height

Consider n words Ξ

1

, . . . , Ξ

n

. What is the height of the resulting Trie?

Def: Height H

n

= max{D

n

i

) : i = 1, . . . , n}.

The result P(D

n

≤ k) = (1 − 2

−k

)

n−1

implies:

P (H

n

> α log

2

(n)) =

Consequence: P (H

n

> α log

2

(n)) → 0 for α > 2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

(59)

Analysis The Height

Consider n words Ξ

1

, . . . , Ξ

n

. What is the height of the resulting Trie?

Def: Height H

n

= max{D

n

i

) : i = 1, . . . , n}.

The result P(D

n

≤ k) = (1 − 2

−k

)

n−1

implies:

P (H

n

> α log

2

(n)) = P (D

n

i

) > α log

2

(n) for some i ∈ {1, . . . , n})

Consequence: P (H

n

> α log

2

(n)) → 0 for α > 2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

(60)

Analysis The Height

Consider n words Ξ

1

, . . . , Ξ

n

. What is the height of the resulting Trie?

Def: Height H

n

= max{D

n

i

) : i = 1, . . . , n}.

The result P(D

n

≤ k) = (1 − 2

−k

)

n−1

implies:

P (H

n

> α log

2

(n)) = P (D

n

i

) > α log

2

(n) for some i ∈ {1, . . . , n})

≤ n · P (D

n

> α log

2

(n))

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

(61)

Analysis The Height

Consider n words Ξ

1

, . . . , Ξ

n

. What is the height of the resulting Trie?

Def: Height H

n

= max{D

n

i

) : i = 1, . . . , n}.

The result P(D

n

≤ k) = (1 − 2

−k

)

n−1

implies:

P (H

n

> α log

2

(n)) = P (D

n

i

) > α log

2

(n) for some i ∈ {1, . . . , n})

≤ n · P (D

n

> α log

2

(n))

≤ n · 1 − 1 − n

−α



n



Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

(62)

Analysis The Height

Consider n words Ξ

1

, . . . , Ξ

n

. What is the height of the resulting Trie?

Def: Height H

n

= max{D

n

i

) : i = 1, . . . , n}.

The result P(D

n

≤ k) = (1 − 2

−k

)

n−1

implies:

P (H

n

> α log

2

(n)) = P (D

n

i

) > α log

2

(n) for some i ∈ {1, . . . , n})

≤ n · P (D

n

> α log

2

(n))

≤ n · 1 − 1 − n

−α



n



≤ n

2−α

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

(63)

Def: Height H

n

= max{D

n

i

) : i = 1, . . . , n}.

The result P(D

n

≤ k) = (1 − 2

−k

)

n−1

implies:

P (H

n

> α log

2

(n)) = P (D

n

i

) > α log

2

(n) for some i ∈ {1, . . . , n})

≤ n · P (D

n

> α log

2

(n))

≤ n · 1 − 1 − n

−α



n



≤ n

2−α

Consequence: P (H

n

> α log

2

(n)) → 0 for α > 2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

(64)

Analysis The Height

Results on H

n

Partly proven on the previous slide:

H

n

2 log

2

(n)

−→ 1

P

E[H

n

] ∼ 2 log

2

(n) (n → ∞) (Flajolet, Steyaert ’82 → periodic second order term)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19

(65)

Analysis The Height

Results on H

n

Partly proven on the previous slide:

H

n

2 log

2

(n)

−→ 1

P

Thm (Devroye ’84):

n→∞

lim P(H

n

− 2 log

2

(n) − 1 ≤ x ) = exp(−2

−x

), x ∈ R

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19

(66)

H

n

2 log

2

(n)

−→ 1

P

Thm (Devroye ’84):

n→∞

lim P(H

n

− 2 log

2

(n) − 1 ≤ x ) = exp(−2

−x

), x ∈ R Thm (Regnier ’82):

E[H

n

] ∼ 2 log

2

(n) (n → ∞) (Flajolet, Steyaert ’82 → periodic second order term)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19

(67)

Analysis The Height

Summary: Typical depth: log

2

(n), height: 2 log

2

(n).

log2n + O(1)

2 log2n + O(1) 2 log2n + O(1)

(External nodes/Leaves) (Internal nodes)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19

(68)

log2log nn + O(1)

log2n + O(1)

2 log2n + O(1)

log2log nn + O(1) log2n + O(1)

2 log2n + O(1)

(External nodes/Leaves) (Internal nodes)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19

(69)

Analysis The External Path Length

Consider n words Ξ

1

, . . . , Ξ

n

. External Path Length:

L

n

:=

n

X

i =1

D

n,i

, D

n,i

= D

n

i

).

Example: L

6

= 2 + 3 + 4 · 4 = 21

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19

(70)

Analysis The External Path Length

Consider n words Ξ

1

, . . . , Ξ

n

. External Path Length:

L

n

:=

n

X

i =1

D

n,i

, D

n,i

= D

n

i

).

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19

(71)

i =1

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

Example: L

6

= 2 + 3 + 4 · 4 = 21

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19

(72)

Analysis The External Path Length

A Recursion for L

n

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

K

n

= # words starting with 0

L

n

=

d

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

(73)

Analysis The External Path Length

A Recursion for L

n

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

K

n

= # words starting with 0

L

n

=

d

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

(74)

Analysis The External Path Length

A Recursion for L

n

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

K

n

= # words starting with 0

L

n

=

d

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

(75)

Analysis The External Path Length

A Recursion for L

n

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

K

n

= # words starting with 0

L

n

=

d

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

(76)

Analysis The External Path Length

A Recursion for L

n

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

K

n

= # words starting with 0

L

n

=

d LKn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

(77)

Analysis The External Path Length

A Recursion for L

n

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

K

n

= # words starting with 0

L

n

=

d LKn

+

n−Kn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

(78)

Ξ4 Ξ2

Ξ3

Ξ1

Ξ6 Ξ5

K

n

= # words starting with 0

L

n

=

d LKn

+

n−Kn

+ n

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

(79)

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for L

n

(after rescaling properly) L

n

= L

d Kn

+ ˜ L

n−Kn

+ n

X =

d

1

2 X + 1

2 X e (1)

3. Solution to (1): Existence of a solution to (1). Here: Normal distribution with mean 0.

4. Contraction: Find a metric such that (1) corresponds to the fixed point of a contracting map.

5. Convergence: Prove convergence with respect to that metric.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

(80)

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for L

n

(after rescaling properly) L

n

= L

d Kn

+ ˜ L

n−Kn

+ n 1. Rescaling: X

n

= (L

n

− E[L

n

])/pVar(L

n

)

3. Solution to (1): Existence of a solution to (1). Here: Normal distribution with mean 0.

4. Contraction: Find a metric such that (1) corresponds to the fixed point of a contracting map.

5. Convergence: Prove convergence with respect to that metric.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

(81)

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for L

n

(after rescaling properly) L

n

= L

d Kn

+ ˜ L

n−Kn

+ n 1. Rescaling: X

n

= (L

n

− E[L

n

])/pVar(L

n

)

X

n

= A

d n,1

X

Kn

+ A

n,2

X e

n−Kn

+ b

n

distribution with mean 0.

4. Contraction: Find a metric such that (1) corresponds to the fixed point of a contracting map.

5. Convergence: Prove convergence with respect to that metric.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

(82)

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for L

n

(after rescaling properly) L

n

= L

d Kn

+ ˜ L

n−Kn

+ n 1. Rescaling: X

n

= (L

n

− E[L

n

])/pVar(L

n

)

X

n

= A

d n,1

X

Kn

+ A

n,2

X e

n−Kn

+ b

n

2. Find the Limits: (A

n,1

, A

n,2

, b

n

) −→ ???

point of a contracting map.

5. Convergence: Prove convergence with respect to that metric.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

(83)

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for L

n

(after rescaling properly) L

n

= L

d Kn

+ ˜ L

n−Kn

+ n 1. Rescaling: X

n

= (L

n

− E[L

n

])/pVar(L

n

)

X

n

= A

d n,1

X

Kn

+ A

n,2

X e

n−Kn

+ b

n

2. Find the Limits: (A

n,1

, A

n,2

, b

n

) −→ (( √

2)

−1

, ( √

2)

−1

, 0)

point of a contracting map.

5. Convergence: Prove convergence with respect to that metric.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

(84)

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for L

n

(after rescaling properly) L

n

= L

d Kn

+ ˜ L

n−Kn

+ n 1. Rescaling: X

n

= (L

n

− E[L

n

])/pVar(L

n

)

X

n

= A

d n,1

X

Kn

+ A

n,2

X e

n−Kn

+ b

n

2. Find the Limits: (A

n,1

, A

n,2

, b

n

) −→ (( √

2)

−1

, ( √

2)

−1

, 0) X =

d

1

2 X + 1

2 X e (1)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

(85)

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for L

n

(after rescaling properly) L

n

= L

d Kn

+ ˜ L

n−Kn

+ n 1. Rescaling: X

n

= (L

n

− E[L

n

])/pVar(L

n

)

X

n

= A

d n,1

X

Kn

+ A

n,2

X e

n−Kn

+ b

n

2. Find the Limits: (A

n,1

, A

n,2

, b

n

) −→ (( √

2)

−1

, ( √

2)

−1

, 0) X =

d

1

2 X + 1

2 X e (1)

3. Solution to (1): Existence of a solution to (1). Here: Normal distribution with mean 0.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

(86)

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for L

n

(after rescaling properly) L

n

= L

d Kn

+ ˜ L

n−Kn

+ n 1. Rescaling: X

n

= (L

n

− E[L

n

])/pVar(L

n

)

X

n

= A

d n,1

X

Kn

+ A

n,2

X e

n−Kn

+ b

n

2. Find the Limits: (A

n,1

, A

n,2

, b

n

) −→ (( √

2)

−1

, ( √

2)

−1

, 0) X =

d

1

2 X + 1

2 X e (1)

3. Solution to (1): Existence of a solution to (1). Here: Normal distribution with mean 0.

4. Contraction: Find a metric such that (1) corresponds to the fixed point of a contracting map.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

(87)

L

n

= L

Kn

+ ˜ L

n−Kn

+ n 1. Rescaling: X

n

= (L

n

− E[L

n

])/pVar(L

n

)

X

n

= A

d n,1

X

Kn

+ A

n,2

X e

n−Kn

+ b

n

2. Find the Limits: (A

n,1

, A

n,2

, b

n

) −→ (( √

2)

−1

, ( √

2)

−1

, 0) X =

d

1

2 X + 1

2 X e (1)

3. Solution to (1): Existence of a solution to (1). Here: Normal distribution with mean 0.

4. Contraction: Find a metric such that (1) corresponds to the fixed point of a contracting map.

5. Convergence: Prove convergence with respect to that metric.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

(88)

Analysis The External Path Length

Results on L

n

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004):

L

n

− E[L

n

] pVar(L

n

)

−→ N (0, 1)

d

From the analysis of D

n

:

E[L

n

] = E

"

n

X

i =1

D

n

i

)

#

Thm (Kirschenhofer, Prodinger ’86):

Var(L

n

) = n e Ψ(log

2

(n)) + O(log

2

(n))

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

(89)

Analysis The External Path Length

Results on L

n

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004):

L

n

− E[L

n

] pVar(L

n

)

−→ N (0, 1)

d

From the analysis of D

n

:

E[L

n

] = E

"

n

X

i =1

D

n

i

)

#

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

(90)

Analysis The External Path Length

Results on L

n

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004):

L

n

− E[L

n

] pVar(L

n

)

−→ N (0, 1)

d

From the analysis of D

n

:

E[L

n

] = E

"

n

X

i =1

D

n

i

)

#

= nE[D

n

]

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

(91)

Analysis The External Path Length

Results on L

n

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004):

L

n

− E[L

n

] pVar(L

n

)

−→ N (0, 1)

d

From the analysis of D

n

:

E[L

n

] = E

"

n

X

i =1

D

n

i

)

#

= nE[D

n

] = n log

2

(n) + nΨ(log

2

(n)) + o(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

(92)

L

n

− E[L

n

] pVar(L

n

)

−→ N (0, 1)

d

From the analysis of D

n

:

E[L

n

] = E

"

n

X

i =1

D

n

i

)

#

= nE[D

n

] = n log

2

(n) + nΨ(log

2

(n)) + o(n)

Thm (Kirschenhofer, Prodinger ’86):

Var(L

n

) = n e Ψ(log

2

(n)) + O(log

2

(n))

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

(93)

Summary

Trie:

tree-like data structure to store words

position of a word in the tree ↔ path given by shortest unique prefix Performance:

Consider input: n independent words, each word is a sequence of

’coin tosses’

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

References

Related documents

This study has raised a number of concerns and questions about architectural photographs of India and they fall into two broad groups: the first group focuses on the documentation

The proposed scheme uses a new multi-pixel encoding method called pixel-block aware encoding that encrypts different amount of pixels for each run where the amount of encoded

Per cent of families who attend certain types of large extended family gatherings... Per cent of families who visit relatives, co-workers, neighbors, or other

a drop in public commitment to research was compensated by business research expenditure.. Businesses sought to conjugate the

Based on the geostatistical model, individual point estimate maps and PCMs were devel- oped to demonstrate incidence of snakebite in Sri Lanka (national snakebite incidence rate:

When you open a new account on behalf of a legal entity, the financial institution will ask for information about the legal entity’s beneficial owner(s) , including their

Podle mého názoru, hlavními faktory, které budou ovlivňovat další pokrok směrem k integraci do EU, budou zejména rychlost provádění reforem a plnění závazků, které

The study further found that the regression model testing for the intervening effect (with strategic change as intervening variable and TMT diversity as