OVERVIEW
What is a hash table?
- a data structure
- stores unordered items by mapping (hashing) each time to a location in an array (vector)
- uses unique key:value pairs to map an index to an item
- each array element is called a bucket
- includes a hash function to compute a bucket index from the item’s key
- support fast search, insert, and remove
How does a hash table operate?
- modulo operator %
- computes the integer remainder when dividing two numbers
- Ex: for a 100 element hash table, a hash function key of % 100 will map keys to bucket indices 0 to 99
- A hash table’s operations of insert, remove, and search each use the hash function to determine an item’s bucket
- Example: Inserting 117 first determines the bucket to be 117 % 5 = 2
- Example: A modulo hash function for 25 entry hash table is: key % 25 – this yields values 0 to 24 (25 values; starts at 0)
- Example: Given a hash table with 50 buckets and modulo hash function, in which bucket will HashInsert(htable, item 22) insert item 222?
- 222 % 50 = 22
- Answer: Bucket Index 22
- Example: Given a has table with 100 buckets and modulo hash function, in which bucket will HashSearch(htable, 767) search for the item?
- 767 % 100 = 67
- Answer: Bucket Index 67
- Search Efficiency
- O(1) is the Big O goal of hash functions
- A modulo hash function will map num_keys/num_buckets keys to each bucket
Hash Table Functions in Python:
When Hash Tables Collide
A collision occurs if a hash table insert function maps to the same bucket as an existing item in the hash table
Q: A hash table has buckets 0 to 9 and uses a hash function of key % 10. If the table is initially empty and the following inserts are applied in the order shown, the insert of which item results in a collision?
- HashInsert(hashTable, item 55)
- HashInsert(hashTable, item 90)
- HashInsert(hashTable, item 95)
A: 3
Collision Resolution Techniques:
Chaining Defined
- each bucket has a list of items
- each list may store multiple items that map to the same bucket
- insert operation first uses the item’s key to determine the bucket then inserts the item in that bucket’s list
- searching and removing first determines the location of the bucket using a given key, then searches the bucket’s list
Chaining with Pseudocode
def HashInsert(hashTable, item):
if (HashSearch(hashTable, item->key) == null): bucketList = hashTable[Hash(item->key)] node = Allocate new linked list node node->next = null node->data = item ListAppend(bucketList, node)
def HashRemove(hashTable, item): bucketList = hashTable[Hash(item->key)] itemNode = ListSearch(bucketList, item->key) if (itemNode is not null): ListRemove(bucketList, itemNode)
def HashSearch(hashTable, key): bucketList = hashTable[Hash(key)] itemNode = ListSearch(bucketList, key) if (itemNode is not null): return itemNode->data else: return null
Open Addressing
looks for empty buckets elsewhere in the table
Q: A hash table’s items will be positive integers, and -1 will represent empty. A 5-bucket hash table is: -1, -1, 72, 93, -1. How many items are in the table?
A: 2
The End
