Hash Tables

OVERVIEW

What is a hash table?

  • a data structure
  • stores unordered items by mapping (hashing) each time to a location in an array (vector)
  • uses unique key:value pairs to map an index to an item
  • each array element is called a bucket
  • includes a hash function to compute a bucket index from the item’s key
  • support fast search, insert, and remove

How does a hash table operate?

  • modulo operator %
    • computes the integer remainder when dividing two numbers
    • Ex: for a 100 element hash table, a hash function key of % 100 will map keys to bucket indices 0 to 99
  • A hash table’s operations of insert, remove, and search each use the hash function to determine an item’s bucket
    • Example: Inserting 117 first determines the bucket to be 117 % 5 = 2
    • Example: A modulo hash function for 25 entry hash table is: key % 25 – this yields values 0 to 24 (25 values; starts at 0)
    • Example: Given a hash table with 50 buckets and modulo hash function, in which bucket will HashInsert(htable, item 22) insert item 222?
      • 222 % 50 = 22
      • Answer: Bucket Index 22
    • Example: Given a has table with 100 buckets and modulo hash function, in which bucket will HashSearch(htable, 767) search for the item?
      • 767 % 100 = 67
      • Answer: Bucket Index 67
  • Search Efficiency
    • O(1) is the Big O goal of hash functions
    • A modulo hash function will map num_keys/num_buckets keys to each bucket

Hash Table Functions in Python:

When Hash Tables Collide

A collision occurs if a hash table insert function maps to the same bucket as an existing item in the hash table

Q: A hash table has buckets 0 to 9 and uses a hash function of key % 10. If the table is initially empty and the following inserts are applied in the order shown, the insert of which item results in a collision?

  1. HashInsert(hashTable, item 55)
  2. HashInsert(hashTable, item 90)
  3. HashInsert(hashTable, item 95)

A: 3

Collision Resolution Techniques:

Chaining Defined
  • each bucket has a list of items
  • each list may store multiple items that map to the same bucket
  • insert operation first uses the item’s key to determine the bucket then inserts the item in that bucket’s list
  • searching and removing  first determines the location of the bucket using a given key, then searches the bucket’s list
Chaining with Pseudocode
def HashInsert(hashTable, item):
   if (HashSearch(hashTable, item->key) == null): bucketList = hashTable[Hash(item->key)] node = Allocate new linked list node node->next = null node->data = item ListAppend(bucketList, node)
def HashRemove(hashTable, item):
   bucketList = hashTable[Hash(item->key)]
   itemNode = ListSearch(bucketList, item->key)
   if (itemNode is not null):
      ListRemove(bucketList, itemNode)
def HashSearch(hashTable, key):
   bucketList = hashTable[Hash(key)]
   itemNode = ListSearch(bucketList, key)
   if (itemNode is not null):
      return itemNode->data
   else:
      return null
Open Addressing

looks for empty buckets elsewhere in the table

Q: A hash table’s items will be positive integers, and -1 will represent empty. A 5-bucket hash table is: -1, -1, 72, 93, -1. How many items are in the table?

A: 2

 The End