What is a Hash Table ?
The simplest kind of
hash table is an array of records.
This example has 701
records.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
What is a Hash Table?
Each record has a
special field, called its key.
In this example, the key
is a long integer field called Number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
What is a Hash Table?
The number might be a
person's identification number, and the rest of the record has
information about the person.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700] [ 4 ]
What is a Hash Table ?
When a hash table is in
use, some spots contain valid records, and other spots are "empty".
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
Inserting a New Record
In order to insert a new
record, the key must
somehow be converted to an array index.
The index is called the hash
value of the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Inserting a New Record
Typical way create a
hash value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685
(Number mod 701)
Inserting a New Record
Typical way to create a
hash value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685
(Number mod 701)
Inserting a New Record
The hash value is used for the location of the new record.
Number 580625685
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Inserting a New Record
The hash value is used
for the location of the new record.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Collisions
Here is another new
record to insert, with a hash value of 2.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685
Number 701466868
My hash value is
Collisions
This is called a collision, because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685
Number 701466868
When a collision occurs,
move forward until you
Collisions
This is called a collision, because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685
Number 701466868
When a collision occurs,
move forward until you
Collisions
This is called a collision, because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685
Number 701466868
When a collision occurs,
move forward until you
Collisions
This is called a collision, because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685 Number 701466868
The new record goes
Searching for a
Key
The data that's attached to
a key can be found fairly quickly.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685 Number 701466868
Searching for a Key
Calculate the hash value.
Check that location of the
array for the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash value is
[2]. Not
Searching for a Key
Keep moving forward until
you find the key, or you reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash value is
[2]. Not
Searching for a Key
Keep moving forward until
you find the key, or you reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash value is
[2]. Not
Searching for a Key
Keep moving forward until
you find the key, or you reach an empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash value is
Searching for a Key
When the item is found, the
information can be copied to the necessary location.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash value is
Deleting a Record
Records may also be deleted from a hash
table.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548 Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685 Number 701466868 Please delete
Deleting a Record
Records may also be deleted from a hash table.
But the location must not be left as an ordinary
"empty spot" since that could interfere with searches.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136
Number 281942902 Number 155778322
. . .
Deleting a Record
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136
Number 281942902 Number 155778322
. . .
Number 580625685 Number 701466868
Records may also be deleted from a hash table.
But the location must not be left as an ordinary
"empty spot" since that could interfere with searches.
The location must be marked in some special
An array in which TableNodes
are not stored consecutively
Their place of storage is
calculated using the key and a
hash function
Keys and entries are scattered
throughout the array.
Hashing
key entry
Key hash function
array index
4
10
insert: calculate place of
storage, insert TableNode; (1)
find: calculate place of
storage, retrieve entry; (1)
remove: calculate place of storage, set it to null; (1)
Hashing
key entry
4
10
123
Hashing
We use an array of some fixed size T to hold
the data. T is typically prime.
Each key is mapped into some number in the
range 0 to T-1 using a hash function, which
Example: fruits
Suppose our hash function gave us
the following values:
hashCode("apple") = 5
hashCode("watermelon") = 3 hashCode("grapes") = 8
hashCode("cantaloupe") = 7 hashCode("kiwi") = 0
hashCode("strawberry") = 9 hashCode("mango") = 6
hashCode("banana") = 2
Example
Store data in a table array:
table[5] = "apple"
table[3] = "watermelon" table[8] = "grapes"
table[7] = "cantaloupe" table[0] = "kiwi"
table[9] = "strawberry" table[6] = "mango"
table[2] = "banana"
Example
Associative array:
Example Hash Functions
If the keys are strings the hash function is some
function of the characters in the strings.
One possibility is to simply add the ASCII values of the
Finding the hash function
int hashCode( char* s ) {
int i, sum; sum = 0;
for(i=0; i < strlen(s); i++ )
sum = sum + s[i]; // ascii value
return sum % TABLESIZE;
Example Hash Functions
Another possibility is to convert the string into
some number in some arbitrary base b (b also
might be a prime number):
T
b
b
b
ABC
h
Example
T
b
i
str
str
h
Length -1 i=0 i)%
67
66
(65
)
(
:
%
]
[
)
(
2 10
+
+
Example Hash Functions
If the keys are integers then key % T is
generally a good hash function, unless the data has some undesirable features.
For example, if T = 10 and all keys end in
zeros, then key % T = 0 for all keys.
In general, to avoid situations like this, T should
Collision: more Examples
Suppose our hash function gave us the following values:
hash("apple") = 5
hash("watermelon") = 3 hash("grapes") = 8
hash("cantaloupe") = 7 hash("kiwi") = 0
hash("strawberry") = 9 hash("mango") = 6
hash("banana") = 2
kiwi banana watermelon apple mango cantaloupe grapes strawberry 0 1 2 3 4 5 6 7 8 9
• Now what?
Collision
When two values hash to the same array
location, this is called a collision
Collisions are normally treated as “first come,
first served”—the first value that hashes to the
location gets it
We have to find something to do with the
Solution for Handling collisions
Solution #1: Search from there for an empty location
Can stop searching when we find the value or
an empty location.
Solution for Handling collisions
Solution #2: Use a Second Hash Function
Solution for Handling collisions
Solution #3: Use the array location as the
header of a linked list of values that hash to
Solution 1:
Open Addressing
This approach of handling collisions is called
open addressing; it is also known as closed hashing.
More formally, cells at h0(x), h1(x), h2(x), … are tried in succession where
hi(x) = (hash(x) + f(i)) mod TableSize, with f(0) = 0.
Linear Probing
We use f(i) = i, i.e., f is a linear function of i. Thus
location(x) = (hash(x) + i) mod TableSize
The collision resolution strategy is called linear
Linear Probing:
insert
Suppose we want to add
seagull to this hash table
Also suppose:
hashCode(“seagull”) = 143
table[143] is not empty
table[143] != seagull
table[144] is not empty
table[144] != seagull
table[145] is empty
Therefore, put seagull at
Linear Probing:
insert
Suppose you want to add
hawk to this hash table
Also suppose
hashCode(“hawk”) = 143
table[143] is not empty
table[143] != hawk
table[144] is not empty
table[144] == hawk
hawk is already in the table, so do nothing.
Linear Probing:
insert
Suppose:
You want to add cardinal to this hash table
hashCode(“cardinal”) = 147 The last location is 148
147 and 148 are occupied
Solution:
Treat the table as circular; after 148 comes 0
Hence, cardinal goes in
location 0 (or 1, or 2, or ...)
Linear Probing:
find
Suppose we want to find
hawk in this hash table
We proceed as follows:
hashCode(“hawk”) = 143
table[143] is not empty
table[143] != hawk
table[144] is not empty
table[144] == hawk (found!)
We use the same procedure
Linear Probing and Deletion
If an item is placed in array[hash(key)+4],
then the item just before it is deleted
How will probe determine that the “space/
hole” does not indicate the item is not in the
array?
Have three states for each location
Occupied
Empty (never used)
Clustering
One problem with linear probing technique is
the tendency to form “clusters”.
A cluster is a group of items not containing any open slots
The bigger a cluster gets, the more likely it is
that new values will hash into the cluster, and make it ever bigger.
Quadratic Probing
Quadratic probing uses different formula:
Use F(i) = i2 to resolve collisions
If hash function resolves to H and a search in
cell H is inconclusive, try H + 12, H + 22, H + 32,
…
Probe
array[hash(key)+12], then
array[hash(key)+22], then
array[hash(key)+32], and so on
Collision resolution:
Chaining
Each table position is a
linked list
Add the keys and entries
anywhere in the list (front easiest)
4
10
123
key entry key entry
key entry key entry
key entry
Collision resolution: chaining
Advantages over open
addressing:
Simpler insertion and
removal
Array size is not a
limitation
Disadvantage
Memory overhead is large
if entries are small.
4
10
123
key entry key entry
key entry key entry
Recap
Collision Strategies in Hashing
Linear Probing
Quadratic Probing
Applications of Hashing
Compilers use hash tables to keep track of
declared variables (symbol table).
A symbol table is an important part of
compilation process. Compiler puts variables
inside symbol table during this process. <Read
Applications of Hashing
A hash table can be used for on-line spelling
checkers — if misspelling detection (rather than correction) is important, an entire dictionary can be hashed and words checked in constant time.
Hash table of all English words
Applications of Hashing
Game playing programs use hash tables to
store seen positions for analysis purpose, thereby saving computation time if the
position is encountered again.
Hash functions can be used to quickly check
When is hashing suitable?
Hash tables are very good if there is a need for
many searches in a reasonably stable table.
Example: Dictionary
Hash tables are not so good if there are many
insertions and deletions, or if table traversals
are needed — in this case, AVL trees are better.
Also, hashing is very slow for any operations
which require the entries to be sorted
There may be spaces/holes in the array
Different Hashing Functions
Home Work: Describe the following hash functions in 3 or 5 lines.
Division or mod function
Folding
Mid Square Function
Extraction
Radix Transformation
Important questions
What is the basic concept of Hashing in Data Structures?
What is collision when using a Hash function? Provide few
examples of collision.
What are different methods for the collision resolution?
What are some of the applications of Hashing?
Is hashing suitable for the following scenarios? Justify
your answer.
Searching in a table
Frequent read and write data
Operations which require Sorting