Lecture 10 & 11
HASHING
Course Supervisor: Syeda Nazia Ashraf
MOTIVATION
Linear Search
• Simplest Algorithm to search for a specific target key in a data collection.
• Examines each element
MOTIVATION
Binary Search
• Requires element to be in an order(sorted).
• Search time depends on the logarithm of the collection size O(log n).
MOTIVATION
Conclusion
• The time taken for a search using each of these methods depends on the size of the collection.
HASHING
• Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
• Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it
HASHING
Hash Tables(Hash Map)
• Simplest data structure.
• Hash Function – Basis of Hash Tables.
Hash Functions
HASHING
Hashes
• The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.
HASHING
IS THERE ANY PARAMETER FOR A GOOD HASH
FUNCTION?
POPULAR HASH FUNCTIONS
1. Division Method
• A key (given element) is mapped into one of m slots using the function.
h(k) = k mod m
Where m is the size of the table and is usually chosen to be a prime number and k is the key.
Different types of hash functions are used for the mapping of keys into tables.
1. Division Method
• Choose a number m larger than the number n of keys in k
• The number m is usually chosen to be a prime no. or a number without small divisors
• The hash function H is defined as,
H(k) = k(mod m) or H(k) = k(mod m) + 1
• Denotes the remainder, when k is divided by m
• Example:
Elements are: 3205, 7148, 2345
Table size: 0 – 99 (prime)
m = 97 (prime no. close to 99)
H(k)=k(mod m) i.e 3205 mod 97=4
H(3205)= 4, H(7148)=67, H(2345)=17
• For 2nd formula add 1 into the remainders.
• H(k)=k(mod m)+1 to obtain:
DIVISION METHOD Contd…
• h(7148) = 7148 mod 97 = 67
• h(2345) = 2345 mod 97 = 17
POPULAR HASH FUNCTIONS Contd…
2. Folding Method
• The key is partitioned into a number of parts where each part
except possibly the last part has the same number of digits as the required address. Then the parts are added together, ignoring the last carry. That is,
• h(k)= k1 + k2 + k3 + … kn
• Sometimes the even numbered parts (k2, k4 …) are reversed before
FOLDING METHOD
Example
• Create a hash table for the Keys 3205, 7148, 2345 by using Folding Method
Solution
• h(3205) = 32 + 05 = 37
• h(7148) = 71 + 48 = 119 (Discard leading digit 1) = 19
FOLDING METHOD Contd…
• Alternatively , one may want to reverse the second part before adding.
• h(3205) = 32 + 50 = 82
• h(7148) = 71 + 84 = 155 (Discard 1) = 55
• h(2345) = 23 + 54 = 77
POPULAR HASH FUNCTIONS Contd…
3. Midsquare Method
• The key is squared . The hash function is defined by
MIDSQAURE METHOD
Example
• Create a hash table for the Keys 3205, 7148, 2345 by using Midsquare Method
• K: 3205 7148 2345
• k2: 10272025 51093904 5499025
• h(k) : 72 93 99
• 4th and 5th digits counting from the right side, are chosen for hash
Hash Function Examples
Let
h(k) = k % 15
. Then,
if k =
25 129 35 2501 47 36
h(k) =
10 9 5 11 2 6
Storing the keys in the array is straightforward:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 _ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _
Hash Function
What happens when you try to insert: k =
65
?
k =
65
h(k) =
5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 _ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _
65(?)
If two keys map on the same hash table
index then we have a collision.
As the number of elements in the table
increases, the likelihood of a collision
increases - so make the table as large as
practical
•
Collisions may still happen, so we need a
collision resolution strategy
COLLISION
• When a hash function maps two different keys to the same table address, a collision is said to occur.
• Two elements can not be stored at the same location in the hash table.
• Two approaches are used to resolve collisions.
• Open Hashing : Means that collisions are resolved by storing the colliding object in a separate area.
• Separate chaining
• Closed Hashing (Open Addressing) : In closed hashing, all keys are stored in the hash table itself.
• Linear Probing
• Quadratic Probing
• Double Hashing
What is Probing?
CLOSED HASHING METHODS (COLLISION RESOLUTION
TECHNIQUES)
1. Linear Probing
• One of the methods for dealing with collisions.
• If a data element hashes to a location in the table which is already
occupied , the table is searched consecutively from that location until an empty location is found.
• The key would then be stored in the empty location.
LINEAR PROBING Contd…
Searching/ lookup
LINEAR PROBING
Exercise Question
•
h(K) = K mod 7
• Insert keys: 76 93 40 47 10 55
Disadvantage
CLOSED HASHING METHODS (COLLISION RESOLUTION
TECHNIQUES)
2. Quadratic Probing
• Here we place the elements by using the hash function
• hi(x) = (h(x) + i2) mod TableSize.
• Fast searching as compared to linear probing.
2. Quadratic Probing
• Quadratic probing is a solution to the clustering problem
• Linear probing adds 1, 2, 3, etc. to the original hashed key
• Quadratic probing adds 12, 22, 32 etc. to the original hashed
key
• However, whereas linear probing guarantees that all
• If the table size is prime, this will try approximately half the table slots.
• More generally, with quadratic probing, insertion may be impossible if the table is more than half-full!
Quadratic Probing
•Quadratic Probing eliminates primary clustering problem of linear probing.
• Collision function is quadratic.
• The popular choice is f(i) = i2.
• If the hash function evaluates to h and a search in cell h is inconclusive, we try cells h + 12, h+22, … h + i2.
• i.e. It examines cells 1,4,9 and so on away from the original probe.
QUADRATIC PROBING Contd…
Example
• h(K) = K mod 7
A quadratic
probing hash table after each
insertion (note that the table size was poorly chosen
CLOSED HASHING METHODS (COLLISION
RESOLUTION TECHNIQUES
)
3. Double Hashing
• uses a secondary hash function h’(k) and places the colliding item in the first available cell of the series.
3. Double Hashing
• 2nd hash function H’ is used to resolve the collision.
• Suppose a record R with key k has hash address H(k)=h and H’(k) = h’ ≠ m
• Therefore we can search the locations with addresses, H’(k) = h, h+h’, h+2h’, h+3h’,…….
Open addressing: store the key/entry in a different position.
Separate Chaining
•
Chain together several keys/entries in each
position.
• Instead of storing the data item directly in the hash table, each hash table entry contains a reference to a data structure, e.g. a linked list.
• In the worst case scenario, all items hash to the same value . Thus we store them in the data structure (
• The idea is to keep a list of all elements that hash to the same value.
– The array elements are pointers to the first nodes of the lists.
– A new item is inserted to the front of the list.
• Advantages:
– Better space utilization for large items.
– Simple collision handling: searching linked list.
– Overflow: we can store more items than the hash table size.
Disadvantages of Separate Chaining
Parts of the array might never be used.
As chains get longer, search time increases
to O(n) in the worst case.
Constructing new chain nodes is relatively
expensive.
Is there a way to use the “unused” space
Example
0 1 2 3 4 5 6 7 8 9 0 81 1 64 4 25 36 16 49 9Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
SEPARATE CHAINING
• In our example, we use a linked list:
Applications of Hashing
• Compilers use hash tables to keep track of declared variables
• A hash table can be used for on-line spelling checkers — if misspelling detection (rather than correction) is important, an entire dictionary can be hashed and words checked in constant time
• Game playing programs use hash tables to store seen
positions, thereby saving computation time if the position is encountered again
• Hash functions can be used to quickly check for inequality — if two elements hash to different values they must be