Lecture Notes for Chapter 11: Hash Tables

Chapter 11 overview

Many applications require a dynamic set that supports only thedictionary opera-

tionsINSERT, SEARCH, and DELETE. Example: a symbol table in a compiler.

A hash table is effective for implementing a dictionary.

• _{The expected time to search for an element in a hash table is}_O(₁)_{, under some}

reasonable assumptions.

• _{Worst-case search time is}(_n)_{, however.}

A hash table is a generalization of an ordinary array.

• _{With an ordinary array, we store the element whose key is}_k_{in position}_k_{of the}

array.

• _{Given a key}_k_{, we}_Þ_{nd the element whose key is}_k _{by just looking in the}_k_th

position of the array. This is calleddirect addressing.

• _{Direct addressing is applicable when we can afford to allocate an array with}

one position for every possible key.

We use a hash table when we do not want to (or cannot) allocate an array with one position per possible key.

• _{Use a hash table when the number of keys actually stored is small relative to}

the number of possible keys.

• _{A hash table is an array, but it typically uses a size proportional to the number}

of keys to be stored (rather than the number of possible keys).

• _{Given a key}_k_{, don’t just use}_k _{as the index into the array.}

• _{Instead, compute a function of}_k_{, and use that value to index into the array. We}

call this function ahash function. Issues that we’ll explore in hash tables:

• _{How to compute hash functions. We’ll look at the multiplication and division}

methods.

• _{What to do when the hash function maps multiple keys to the same table entry.}

11-2 Lecture Notes for Chapter 11: Hash Tables

Direct-address tables

Scenario:

• _{Maintain a dynamic set.}

• _{Each element has a key drawn from a universe}_U = {₀,₁, . . . ,_m−₁}_where

misn’t too large.

• _{No two elements have the same key.}

Represent by adirect-address table, or array,T[0. . .m−1]:

• _Each_slot_{, or position, corresponds to a key in}_U_.

• _{If there’s an element}_x _{with key}_k_{, then}_T_[_k_{] contains a pointer to}_x_. • _Otherwise,_T_[_k_{] is empty, represented by}_NIL.

T U (universe of keys) K (actual keys) 2 3 5 8 1 9 4 0 7 6 2 3 5 8

key satellite data

2 0 1 3 4 5 6 7 8 9

Dictionary operations are trivial and takeO(1)time each: DIRECT-ADDRESS-SEARCH(T,k) returnT[k] DIRECT-ADDRESS-INSERT(T,x) T[key[x]]←x DIRECT-ADDRESS-DELETE(T,x) T[key[x]]←NIL Hash tables

The problem with direct addressing is if the universeUis large, storing a table of size|U|may be impractical or impossible.

Often, the set K of keys actually stored is small, compared toU, so that most of the space allocated forT is wasted.

Lecture Notes for Chapter 11: Hash Tables 11-3

• _When _K _{is much smaller than}_U_{, a hash table requires much less space than a}

direct-address table.

• _{Can reduce storage requirements to}(|_K|)_.

• _{Can still get}_O(₁)_{search time, but in the}_{average case}_{, not the}_{worst case}_. Idea: Instead of storing an element with keyk in slotk, use a functionhand store the element in sloth(k).

• _{We call}_h_a_{hash function}_.

• _h _:_U → {₀,₁, . . . ,_m−₁}_{, so that}_h(_k)_{is a legal slot number in}_T_. • _{We say that}_k_hashes_{to slot}_h(_k)_.

Collisions: When two or more keys hash to the same slot.

• _{Can happen when there are more possible keys than slots (}|_U|>_m_).

• _{For a given set} _K _{of keys with}|_K| ≤ _m_{, may or may not happen. De}_Þ_nitely

happens if|K|>m.

• _{Therefore, must be prepared to handle collisions in all cases.} • _{Use two methods: chaining and open addressing.}

• _{Chaining is usually better than open addressing. We’ll examine both.} Collision resolution by chaining

Put all elements that hash to the same slot into a linked list.

T U (universe of keys) K (actual keys) k1 k2 k3 k4 k5 k6 k7 k8 k1 k2 k3 k4 k5 k6 k7 k8

[ThisÞgure shows singly linked lists. If we want to delete elements, it’s better to use doubly linked lists.]

• _Slot _j _{contains a pointer to the head of the list of all stored elements that hash}

to j[or to the sentinel if using a circular, doubly linked list with a sentinel] ,

11-4 Lecture Notes for Chapter 11: Hash Tables

How to implement dictionary operations with chaining:

• _Insertion:

CHAINED-HASH-INSERT(T,x)

insert x at the head of listT[h(key[x])]

• _{Worst-case running time is}_O(₁)_.

• _{Assumes that the element being inserted isn’t already in the list.} • _{It would take an additional search to check if it was already inserted.}

• _Search:

CHAINED-HASH-SEARCH(T,k)

search for an element with keykin listT[h(k)]

Running time is proportional to the length of the list of elements in sloth(k).

• _Deletion:

CHAINED-HASH-DELETE(T,x)

deletex from the listT[h(key[x])]

• _{Given pointer} _x _{to the element to delete, so no search is needed to}_Þ_{nd this}

element.

• _{Worst-case running time is}_O(₁)_{time if the lists are doubly linked.}

• _{If the lists are singly linked, then deletion takes as long as searching, be-}

cause we mustÞndx’s predecessor in its list in order to correctly updatenext

pointers.

Analysis of hashing with chaining

Given a key, how long does it take toÞnd an element with that key, or to determine that there is no element with that key?

• _{Analysis is in terms of the}_{load factor}α =_n/_m_: • _n =_{# of elements in the table.}

• _m =_{# of slots in the table}=_{# of (possibly empty) linked lists.} • _{Load factor is average number of elements per linked list.} • _{Can have}α <_1,α=_{1, or}α >_1.

• _{Worst case is when all}_n_{keys hash to the same slot}⇒_{get a single list of length}_n ⇒worst-case time to search is(n), plus time to compute hash function.

• _{Average case depends on how well the hash function distributes the keys among}

the slots.

We focus on average-case performance of hashing with chaining.

• _Assume_{simple uniform hashing}_{: any given element is equally likely to hash}

Lecture Notes for Chapter 11: Hash Tables 11-5

• _For _j = ₀,₁, . . . ,_m − _{1, denote the length of list} _T_[_j_{] by} _n_j_. _Then

n =n0+n1+ · · · +nm−1.

• _{Average value of}_n_j _{is E [}_n_j_]=α =_n/_m_.

• _{Assume that we can compute the hash function in} _O(₁)_{time, so that the time}

required to search for the element with keykdepends on the lengthnh(k)of the

listT[h(k)]. We consider two cases:

• _{If the hash table contains no element with key}_k_{, then the search is unsuccessful.} • _{If the hash table does contain an element with key}_k_{, then the search is success-}

ful.

[In the theorem statements that follow, we omit the assumptions that we’re resolv- ing collisions by chaining and that simple uniform hashing applies.]

Unsuccessful search: Theorem

An unsuccessful search takes expected time(1+α).

Proof Simple uniform hashing⇒any key not already in the table is equally likely to hash to any of themslots.

To search unsuccessfully for any keyk, need to search to the end of the listT[h(k)]. This list has expected length E[nh(k)] = α. Therefore, the expected number of

elements examined in an unsuccessful search isα.

Adding in the time to compute the hash function, the total time required is (1+α).

Successful search:

• _{The expected time for a successful search is also}(₁+α)_.

• _{The circumstances are slightly different from an unsuccessful search.}

• _{The probability that each list is searched is proportional to the number of ele-}

ments it contains.

Theorem

A successful search takes expected time(1+α).

Proof Assume that the element x being searched for is equally likely to be any of thenelements stored in the table.

The number of elements examined during a successful search forx is 1 more than the number of elements that appear before x in x’s list. These are the elements insertedafter x was inserted (because we insert at the head of the list).

So we need to Þnd the average, over then elements x in the table, of how many elements were inserted intox’s list afterx was inserted.

For i = 1,2, . . . ,n, let xi be the ith element inserted into the table, and let

11-6 Lecture Notes for Chapter 11: Hash Tables

For alliand j, deÞne indicator random variable Xi j =I{h(ki)=h(kj)}.

Simple uniform hashing ⇒ Pr{h(ki)=h(kj)} = 1/m ⇒ E [Xi j] = 1/m (by Lemma 5.1).

Expected number of elements examined in a successful search is E 1 n n i=1 1+ n j=i+1 Xi j = 1 n n i=1 1+ n j=i+1 E [Xi j] (linearity of expectation) = 1 n n i=1 1+ n j=i+1 1 m = 1+ 1 nm n i=1 (n−i) = 1+ 1 nm _n i=1 n− n i=1 i = 1+ 1 nm n2−n(n+1) 2 (equation (A.1)) = 1+n−1 2m = 1+α 2 − α 2n .

Adding in the time for computing the hash function, we get that the expected total time for a successful search is(2+α/2−α/2n)=(1+α).

Alternative analysis, using indicator random variables even more:

For each slot l and for each pair of keys ki andkj, deÞne the indicator random variable Xi jl = I{the search is forxi,h(ki)=l, andh(kj)=l}. Xi jl = 1 when keyski andkj collide at slotland when we are searching forxi.

Simple uniform hashing ⇒ Pr{h(ki)=l} = 1/m and Pr{h(kj)=l} = 1/m. Also have Pr{the search is forxi} = 1/n. These events are all independent ⇒ Pr{Xi jl =1} =1/nm2⇒E [Xi jl]=1/nm2(by Lemma 5.1).

DeÞne, for each elementxj, the indicator random variable

Yj =I{xj appears in a list prior to the element being searched for} .

Yj =1 if and only if there is some slotlthat has both elementsxi andxj in its list, and alsoi < j (so thatxi appears afterxj in the list). Therefore,

Yj = j−1 i=1 m−1 l=0 Xi jl .

OneÞnal random variable: Z, which counts how many elements appear in the list prior to the element being searched for: Z =n_j₌₁Yj. We must count the element

Lecture Notes for Chapter 11: Hash Tables 11-7

being searched for as well as all those preceding it in its list⇒compute E[Z+1]: E [Z +1] = E 1+ n j=1 Yj = 1+E _n j=1 j−1 i=1 m−1 l=0 Xi jl (linearity of expectation) = 1+ n j=1 j−1 i=1 m−1 l=0 E [Xi jl] (linearity of expectation) = 1+ n j=1 j−1 i=1 m−1 l=0 1 nm2 = 1+ n 2 ·m· 1 nm2 = 1+n(n−1) 2 · 1 nm = 1+n−1 2m = 1+ n 2m − 1 2m = 1+α 2 − α 2n .

Adding in the time for computing the hash function, we get that the expected total time for a successful search is(2+α/2−α/2n)=(1+α).

Interpretation: Ifn = O(m), thenα =n/m = O(m)/m = O(1), which means that searching takes constant time on average.

Since insertion takes O(1) worst-case time and deletion takes O(1) worst-case time when the lists are doubly linked, all dictionary operations takeO(1)time on average.

Hash functions

We discuss some issues regarding hash-function design and present schemes for hash function creation.

What makes a good hash function?

• _{Ideally, the hash function satis}_Þ_{es the assumption of simple uniform hashing.} • _{In practice, it’s not possible to satisfy this assumption, since we don’t know in}

advance the probability distribution that keys are drawn from, and the keys may not be drawn independently.

• _{Often use heuristics, based on the domain of the keys, to create a hash function}

11-8 Lecture Notes for Chapter 11: Hash Tables

Keys as natural numbers

• _{Hash functions assume that the keys are natural numbers.} • _{When they’re not, have to interpret them as natural numbers.}

• _Example: _{Interpret a character string as an integer expressed in some radix}

notation. Suppose the string isCLRS:

• _{ASCII values:} _C=_67,_L=_76,_R=_82,_S=_83. • _{There are 128 basic ASCII values.}

• _{So interpret}_CLRS_as(₆₇·₁₂₈3₎₊₍₇₆_·₁₂₈2₎₊₍₈₂_·₁₂₈1₎₊₍₈₃_·₁₂₈0₎₌

141,764,947.

Division method

h(k)=kmodm.

Example: m=20 andk =91⇒h(k)=11.

Advantage: Fast, since requires just one division operation.

Disadvantage: Have to avoid certain values ofm:

• _{Powers of 2 are bad. If}_m=₂p _{for integer} _p_{, then}_h₍_k₎_{is just the least signi}_Þ_- cant pbits ofk.

• _If _k _{is a character string interpreted in radix 2}p _{(as in} _CLRS_{example), then}

m = 2p₋_{1 is bad: permuting characters in a string does not change its hash} value (Exercise 11.3-3).

Good choice for m: A prime not too close to an exact power of 2.

Multiplication method

1. Choose constant Ain the range 0< A<1. 2. Multiply keykby A.

3. Extract the fractional part ofk A. 4. Multiply the fractional part bym. 5. Take theßoor of the result.

Put another way, h(k) = m(k Amod 1), where k Amod 1 = k A− k A =

fractional part ofk A.

Disadvantage: Slower than division method.

Advantage: Value ofmis not critical.

(Relatively) easy implementation: • _Choose_m=₂p_{for some integer} _p_. • _{Let the word size of the machine be}w_bits.

• _{Assume that}_k _Þ_{ts into a single word. (}_k_takesw_bits.) • _Let_s _{be an integer in the range 0}<_s <₂w_{. (}_s_takesw_bits.)

Lecture Notes for Chapter 11: Hash Tables 11-9

• _Restrict _A_{to be of the form}_s/₂w_.

× binary point s= A·2w wbits k r0 r1 h(k) extractpbits • _Multiply_k _by_s_.

• _{Since we’re multiplying two}w_{-bit words, the result is 2}w_bits,_r₁₂w+_r₀_{, where}

r1is the high-order word of the product andr0is the low-order word.

• _r₁ _{holds the integer part of} _{k A}₍ _{k A}_{) and}_r₀ _{holds the fractional part of}_{k A}

(k Amod 1 = k A− k A). Think of the “binary point” (analog of decimal point, but for binary representation) as being betweenr1andr0. Since we don’t

care about the integer part ofk A, we can forget aboutr1and just user0. • _{Since we want} _m(_{k A}_{mod 1})_{, we could get that value by shifting} _r₀_{to the}

left by p =lgmbits and then taking the pbits that were shifted to the left of the binary point.

• _{We don’t need to shift. The} _p_{bits that would have been shifted to the left of the}

binary point are the pmost signiÞcant bits ofr0. So we can just take these bits

after having formedr0by multiplyingkbys.

• _Example: _m = _{8 (implies} _p = _3),w = _5, _k = _{21. Must have 0} < _s <₂5_;

chooses =13⇒ A=13/32.

• _{Using just the formula to compute}_h(_k)_:_{k A}=₂₁·₁₃/₃₂ =₂₇₃/₃₂ =₈17 32

⇒ k Amod 1 = 17/32 ⇒m(k Amod 1) = 8·17/32 = 17/4 = 41₄ ⇒

m(k Amod 1) =4, so thath(k)=4.

• _{Using the implementation:} _ks = ₂₁·₁₃ = ₂₇₃ = ₈·₂5₊₁₇ _⇒_r 1=8,

r0=17. Written inw=5 bits,r0=10001. Take the p=3 most signiÞcant

bits ofr0, get 100 in binary, or 4 in decimal, so thath(k)=4. How to choose A:

• _{The multiplication method works with any legal value of} _A_.

• _{But it works better with some values than with others, depending on the keys}

being hashed.

• _{Knuth suggests using} _A≈(√₅−₁)/_2. Universal hashing

[We just touch on universal hashing in these notes. See the book for a full treat- ment.]

Suppose that a malicious adversary, who gets to choose the keys to be hashed, has seen your hashing program and knows the hash function in advance. Then he could choose keys that all hash to the same slot, giving worst-case behavior.

11-10 Lecture Notes for Chapter 11: Hash Tables

One way to defeat the adversary is to use a different hash function each time. You choose one at random at the beginning of your program. Unless the adversary knows how you’ll be randomly choosing which hash function to use, he cannot intentionally defeat you.

Just because we choose a hash function randomly, that doesn’t mean it’s a good hash function. What we want is to randomly choose a single hash function from a set of good candidates.

Consider aÞnite collectionHof hash functions that map a universeUof keys into the range{0,1, . . . ,m−1}.Hisuniversalif for each pair of keysk,l ∈U, where

k =l, the number of hash functionsh ∈H for whichh(k)=h(l)is≤ |H|/m. Put another way, H is universal if, with a hash function h chosen randomly fromH, the probability of a collision between two different keys is no more than than 1/mchance of just choosing two slots randomly and independently.

Why are universal hash functions good?

• _{They give good hashing behavior:} Theorem

Using chaining and universal hashing on keyk:

• _If_k _{is not in the table, the expected length E[}_n_h₍_k₎_{] of the list that}_k _hashes

to is≤α.

• _If _k _{is in the table, the expected length E[}_n_h₍_k₎_{] of the list that holds} _k _is ≤1+α.

Corollary

Using chaining and universal hashing, the expected time for each SEARCHop- eration is O(1).

• _{They are easy to design.}

[See book for details of behavior and design of a universal class of hash functions.]

Open addressing

An alternative to chaining for handling collisions.

Idea:

• _{Store all keys in the hash table itself.} • _{Each slot contains either a key or}_NIL. • _{To search for key}_k_:

• _Compute_h(_k)_{and examine slot}_h(_k)_{. Examining a slot is known as a}_probe_. • _{If slot}_h(_k)_{contains key}_k_{, the search is successful. If this slot contains}_NIL,

the search is unsuccessful.

• _{There’s a third possibility: slot}_h(_k)_{contains a key that is not}_k_{. We compute}

the index of some other slot, based onk and on which probe (count from 0: 0th, 1st, 2nd, etc.) we’re on.

Lecture Notes for Chapter 11: Hash Tables 11-11

• _{Keep probing until we either}_Þ_{nd key}_k _{(successful search) or we}_Þ_{nd a slot}

holdingNIL(unsuccessful search).

• _{We need the sequence of slots probed to be a permutation of the slot numbers} 0,1, . . . ,m−1(so that we examine all slots if we have to, and so that we don’t examine any slot more than once).

• _{Thus, the hash function is}_h _:_U× {₀,₁, . . . ,_m−₁}

probe number

→ {0,1, . . . ,m−1} slot number

• _{The requirement that the sequence of slots be a permutation of} ₀, ₁, . . . ,

In document Introduction to Algorithms Cormen Solution pdf (Page 145-160)