☕ Java Q62 / 162

How does HashMap handle collisions?

AI-Powered Answer ✓ Answered

In Java's `HashMap`, collisions occur when two different keys produce the same hash code, or map to the same bucket index. Efficiently managing these collisions is crucial for `HashMap`'s performance and correctness.

HashMap Fundamentals

A HashMap stores key-value pairs in an array of buckets. When a key-value pair is inserted, the hashCode() method of the key object is used to calculate an integer hash value. This hash value is then processed (typically using a bitwise AND operation with (table.length - 1)) to determine the index of the bucket where the entry should be stored.

The contract between hashCode() and equals() is fundamental: if two objects are equal according to the equals() method, their hashCode() methods must produce the same integer value. Conversely, if two objects produce the same hash code, they are not necessarily equal.

What is a Collision?

A collision occurs when two distinct keys, key1 and key2, generate the same bucket index. This can happen if key1.hashCode() and key2.hashCode() are identical, or if they are different but map to the same array index after internal hashing. When a collision occurs, both entries attempt to occupy the same slot in the HashMap's internal array.

Collision Resolution: Separate Chaining

The primary mechanism for resolving collisions in Java's HashMap (prior to Java 8, and still the default for small collision chains) is Separate Chaining. Instead of storing just one entry per bucket, each bucket in the HashMap's array holds a reference to a linked list (or, more generally, a chain) of Map.Entry objects. Each Map.Entry object contains the key, value, and a reference to the next entry in the chain.

When put(key, value) is called:

  • The hashCode() of the key determines the bucket index.
  • If the bucket is empty, the new entry is placed there.
  • If the bucket already contains entries (a collision has occurred), the HashMap iterates through the linked list at that bucket.
  • For each entry in the list, it compares the new key with the existing key using the equals() method. If a match is found, the value for that key is updated.
  • If no matching key is found after traversing the entire list, the new entry is appended to the end of the list.

When get(key) is called:

  • The hashCode() of the key determines the bucket index.
  • The HashMap navigates to the linked list at that bucket.
  • It then traverses the list, comparing the target key with each entry's key using the equals() method.
  • Once a matching key is found, its associated value is returned. If the end of the list is reached without a match, null is returned.

Java 8+ Enhancement: Treeification

To mitigate the worst-case performance scenario of a very long linked list (where lookups degrade to O(n) complexity), Java 8 introduced an enhancement: treeification. If the number of entries in a single bucket's linked list exceeds a certain threshold (default TREEIFY_THRESHOLD of 8), the linked list is converted into a balanced Red-Black Tree.

This change significantly improves the performance of operations on highly-colliding buckets from O(n) to O(log n). The tree structure uses the compareTo() method (if keys implement Comparable) or a derived order to maintain order and allow for efficient searching.

Conversely, if the number of nodes in a tree bucket falls below a certain UNTREEIFY_THRESHOLD (default 6), it's converted back into a linked list to avoid the overhead of tree structures for small numbers of elements.

Impact of `hashCode()` and `equals()` Quality

The effectiveness of HashMap's collision handling, and thus its overall performance, heavily depends on the quality of the hashCode() implementation for its keys. A poorly implemented hashCode() that consistently produces the same or similar hash codes for different objects will lead to many collisions, resulting in long chains (or trees) in buckets and degrading HashMap operations towards O(n) in the worst case.

A well-designed hashCode() distributes keys uniformly across buckets, minimizing collisions and keeping chain lengths short, ensuring that HashMap operations remain close to their ideal O(1) average time complexity. The equals() method is equally critical for correctly identifying the unique key within a collision chain.