• No results found

Hashing.pdf

N/A
N/A
Protected

Academic year: 2020

Share "Hashing.pdf"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 10 & 11

HASHING

Course Supervisor: Syeda Nazia Ashraf

(2)

MOTIVATION

Linear Search

• Simplest Algorithm to search for a specific target key in a data collection.

• Examines each element

(3)

MOTIVATION

Binary Search

• Requires element to be in an order(sorted).

• Search time depends on the logarithm of the collection size O(log n).

(4)

MOTIVATION

Conclusion

• The time taken for a search using each of these methods depends on the size of the collection.

(5)

HASHING

• Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.

• Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it

(6)

HASHING

Hash Tables(Hash Map)

• Simplest data structure.

• Hash Function – Basis of Hash Tables.

Hash Functions

(7)

HASHING

Hashes

• The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.

(8)

HASHING

IS THERE ANY PARAMETER FOR A GOOD HASH

FUNCTION?

(9)

POPULAR HASH FUNCTIONS

1. Division Method

• A key (given element) is mapped into one of m slots using the function.

h(k) = k mod m

Where m is the size of the table and is usually chosen to be a prime number and k is the key.

Different types of hash functions are used for the mapping of keys into tables.

(10)

1. Division Method

• Choose a number m larger than the number n of keys in k

• The number m is usually chosen to be a prime no. or a number without small divisors

• The hash function H is defined as,

H(k) = k(mod m) or H(k) = k(mod m) + 1

• Denotes the remainder, when k is divided by m

(11)

Example:

Elements are: 3205, 7148, 2345

Table size: 0 – 99 (prime)

m = 97 (prime no. close to 99)

H(k)=k(mod m) i.e 3205 mod 97=4

H(3205)= 4, H(7148)=67, H(2345)=17

For 2nd formula add 1 into the remainders.

• H(k)=k(mod m)+1 to obtain:

(12)

DIVISION METHOD Contd…

• h(7148) = 7148 mod 97 = 67

• h(2345) = 2345 mod 97 = 17

(13)

POPULAR HASH FUNCTIONS Contd…

2. Folding Method

• The key is partitioned into a number of parts where each part

except possibly the last part has the same number of digits as the required address. Then the parts are added together, ignoring the last carry. That is,

• h(k)= k1 + k2 + k3 + … kn

• Sometimes the even numbered parts (k2, k4 …) are reversed before

(14)

FOLDING METHOD

Example

• Create a hash table for the Keys 3205, 7148, 2345 by using Folding Method

Solution

• h(3205) = 32 + 05 = 37

• h(7148) = 71 + 48 = 119 (Discard leading digit 1) = 19

(15)

FOLDING METHOD Contd…

• Alternatively , one may want to reverse the second part before adding.

• h(3205) = 32 + 50 = 82

• h(7148) = 71 + 84 = 155 (Discard 1) = 55

• h(2345) = 23 + 54 = 77

(16)

POPULAR HASH FUNCTIONS Contd…

3. Midsquare Method

• The key is squared . The hash function is defined by

(17)

MIDSQAURE METHOD

Example

• Create a hash table for the Keys 3205, 7148, 2345 by using Midsquare Method

• K: 3205 7148 2345

• k2: 10272025 51093904 5499025

• h(k) : 72 93 99

• 4th and 5th digits counting from the right side, are chosen for hash

(18)

Hash Function Examples

Let

h(k) = k % 15

. Then,

if k =

25 129 35 2501 47 36

h(k) =

10 9 5 11 2 6

Storing the keys in the array is straightforward:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 _ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _

(19)

Hash Function

What happens when you try to insert: k =

65

?

k =

65

h(k) =

5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 _ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _

65(?)

(20)

If two keys map on the same hash table

index then we have a collision.

As the number of elements in the table

increases, the likelihood of a collision

increases - so make the table as large as

practical

Collisions may still happen, so we need a

collision resolution strategy

(21)

COLLISION

• When a hash function maps two different keys to the same table address, a collision is said to occur.

• Two elements can not be stored at the same location in the hash table.

• Two approaches are used to resolve collisions.

Open Hashing : Means that collisions are resolved by storing the colliding object in a separate area.

Separate chaining

Closed Hashing (Open Addressing) : In closed hashing, all keys are stored in the hash table itself.

• Linear Probing

• Quadratic Probing

• Double Hashing

What is Probing?

(22)

CLOSED HASHING METHODS (COLLISION RESOLUTION

TECHNIQUES)

1. Linear Probing

• One of the methods for dealing with collisions.

• If a data element hashes to a location in the table which is already

occupied , the table is searched consecutively from that location until an empty location is found.

• The key would then be stored in the empty location.

(23)

LINEAR PROBING Contd…

Searching/ lookup

(24)
(25)
(26)

LINEAR PROBING

Exercise Question

h(K) = K mod 7

• Insert keys: 76 93 40 47 10 55

Disadvantage

(27)
(28)

CLOSED HASHING METHODS (COLLISION RESOLUTION

TECHNIQUES)

2. Quadratic Probing

• Here we place the elements by using the hash function

• hi(x) = (h(x) + i2) mod TableSize.

• Fast searching as compared to linear probing.

(29)

2. Quadratic Probing

• Quadratic probing is a solution to the clustering problem

Linear probing adds 1, 2, 3, etc. to the original hashed key

Quadratic probing adds 12, 22, 32 etc. to the original hashed

key

• However, whereas linear probing guarantees that all

(30)

• If the table size is prime, this will try approximately half the table slots.

• More generally, with quadratic probing, insertion may be impossible if the table is more than half-full!

(31)

Quadratic Probing

Quadratic Probing eliminates primary clustering problem of linear probing.

• Collision function is quadratic.

• The popular choice is f(i) = i2.

• If the hash function evaluates to h and a search in cell h is inconclusive, we try cells h + 12, h+22, … h + i2.

• i.e. It examines cells 1,4,9 and so on away from the original probe.

(32)

QUADRATIC PROBING Contd…

Example

• h(K) = K mod 7

(33)

A quadratic

probing hash table after each

insertion (note that the table size was poorly chosen

(34)

CLOSED HASHING METHODS (COLLISION

RESOLUTION TECHNIQUES

)

3. Double Hashing

• uses a secondary hash function h’(k) and places the colliding item in the first available cell of the series.

(35)

3. Double Hashing

• 2nd hash function H’ is used to resolve the collision.

• Suppose a record R with key k has hash address H(k)=h and H’(k) = h’ ≠ m

• Therefore we can search the locations with addresses, H’(k) = h, h+h’, h+2h’, h+3h’,…….

(36)
(37)
(38)

Open addressing: store the key/entry in a different position.

Separate Chaining

Chain together several keys/entries in each

position.

• Instead of storing the data item directly in the hash table, each hash table entry contains a reference to a data structure, e.g. a linked list.

• In the worst case scenario, all items hash to the same value . Thus we store them in the data structure (

(39)

• The idea is to keep a list of all elements that hash to the same value.

– The array elements are pointers to the first nodes of the lists.

– A new item is inserted to the front of the list.

Advantages:

– Better space utilization for large items.

– Simple collision handling: searching linked list.

– Overflow: we can store more items than the hash table size.

(40)

Disadvantages of Separate Chaining

Parts of the array might never be used.

As chains get longer, search time increases

to O(n) in the worst case.

Constructing new chain nodes is relatively

expensive.

Is there a way to use the “unused” space

(41)

Example

0 1 2 3 4 5 6 7 8 9 0 81 1 64 4 25 36 16 49 9

Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81

(42)

SEPARATE CHAINING

• In our example, we use a linked list:

(43)

Applications of Hashing

• Compilers use hash tables to keep track of declared variables

• A hash table can be used for on-line spelling checkers — if misspelling detection (rather than correction) is important, an entire dictionary can be hashed and words checked in constant time

• Game playing programs use hash tables to store seen

positions, thereby saving computation time if the position is encountered again

• Hash functions can be used to quickly check for inequality — if two elements hash to different values they must be

References

Related documents

This paper investigated recommendation techniques that include content-based recommendation technique, collaborative (social) filtering technique, hybrid recommendation

For interval-valued intu- itionistic fuzzy multicriteria group decision-making problem with incomplete information on the weights of criteria, an entropy weight model is established

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

Additional reporting developments in chemical periodicity at first period table of chemically inert gas in the real chemistry is obtained from several new element.. The noble gases

● There were 654 (58 percent) facilities with OTPs that provided substance abuse treatment services in a language other than English, either by a staff counselor or through

One method is based on merging the output of two different OCR systems, the other is based on producing spelling variants for “unknown” words and predicting which variant is most

We all have areas in our lives where we feel stuck, at least one area that we cannot change, but look for what you can change, look for what you can learn, look for what God

Recently, many algorithms based on one-dimensional and two-dimensional processing have been used to enhance the system performance, such as adaptive temporal matched filtering