🗄️ SQL Q53 / 127

What is hash index?

AI-Powered Answer ✓ Answered

A hash index is a data structure used in database systems to speed up data retrieval by mapping index keys directly to their physical storage location using a hash function. It excels at equality-based lookups but has significant limitations for other types of queries.

What is a Hash Index?

A hash index employs a hash function to compute a hash value for each key value in the indexed column. This hash value then points directly to the row's physical address (or a bucket containing pointers to rows). When a query requests a specific key, the hash function is applied to the search key, and the resulting hash value is used to quickly locate the corresponding data row. This direct mapping makes equality lookups extremely fast, often approaching O(1) complexity in the best case, as it avoids traversing a tree structure.

How it Works

  • Hash Function: A mathematical algorithm takes an input (the column value) and returns a fixed-size string of characters, which is the hash value.
  • Hash Table/Array: These hash values are stored in a hash table or array, along with pointers to the actual data rows.
  • Lookup: When searching for a value, the database computes its hash value and goes directly to the corresponding entry in the hash table to retrieve the row pointer.

Advantages

  • Extremely Fast Equality Lookups: Ideal for queries using = (equals operator) as it provides near-constant time retrieval once the hash is computed.
  • Efficient for Exact Matches: Performs exceptionally well when you need to find specific unique values or frequently accessed exact values.
  • Minimal I/O Operations: For exact matches, it can often retrieve data with fewer disk I/O operations compared to other index types.

Disadvantages

  • No Support for Range Queries: Cannot be used for BETWEEN, >, <, >= or <= operations because the hash function scrambles the order of keys. The hash values themselves are not ordered.
  • No Support for Sorting: Cannot be used for ORDER BY clauses since the index does not store data in any particular sorted order.
  • No Support for Prefix Matching: Cannot be used for LIKE 'prefix%' queries because changing even a single character in the input key results in a completely different hash value.
  • Hash Collisions: Different input keys might produce the same hash value (a collision). The database must have a strategy to handle these collisions, which can slightly degrade performance.
  • Higher Memory Consumption: Can consume more memory than B-tree indexes, especially if the hash table becomes sparse or collision handling is complex.
  • Not Universal: Not all database systems fully support hash indexes, and their implementation details and limitations vary (e.g., PostgreSQL only supports them for equality comparisons and they are not crash-safe by default, requiring a rebuild after a crash).

Common Use Cases

  • Primary Key Lookups: Especially useful for unique identifiers like UUIDs or serial numbers where only exact matches are expected.
  • Exact Value Filtering: When a column is frequently queried for specific, distinct values and range queries are rare.
  • Joins: Can sometimes accelerate hash joins when one of the join keys has a hash index.

Comparison with B-tree Indexes

FeatureHash IndexB-tree Index
Primary GoalFast equality lookupsFast range, equality, and sort operations
Order of KeysNot preservedPreserved (sorted)
Equality Queries (`=`)Excellent (O(1) average)Good (O(log n))
Range Queries (`>`, `<`, `BETWEEN`)Not supportedExcellent
Sorting (`ORDER BY`)Not supportedExcellent
Prefix Matching (`LIKE 'prefix%'`)Not supportedSupported
Memory UsagePotentially higherGenerally lower/optimized
Collision HandlingRequiredNot applicable