🐍 Python Q95 / 170

How to implement caching in Python?

AI-Powered Answer ✓ Answered

Caching is a technique to store the results of expensive operations (like I/O, computations, or network requests) and return the cached result when the same inputs occur again, rather than re-computing or re-fetching. This can significantly improve the performance and responsiveness of Python applications.

1. Using `functools.lru_cache` (Least Recently Used Cache)

functools.lru_cache is a built-in decorator in Python 3 that provides a simple way to cache function calls. It uses a Least Recently Used (LRU) algorithm to evict items when the cache reaches its maximum size. It's ideal for memoizing pure functions (functions that produce the same output for the same input and have no side effects).

python
import functools
import time

@functools.lru_cache(maxsize=None)  # maxsize=None means unlimited cache, use an integer for a bounded cache
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

print("Calculating fibonacci(30) without cache (first run):")
start_time = time.time()
print(f"Result: {fibonacci(30)}")
print(f"Time taken: {time.time() - start_time:.4f} seconds\n")

print("Calculating fibonacci(30) with cache (second run):")
start_time = time.time()
print(f"Result: {fibonacci(30)}")
print(f"Time taken: {time.time() - start_time:.4f} seconds")

# Clearing the cache
fibonacci.cache_clear()
print("\nCache cleared. Recalculating fibonacci(30):")
start_time = time.time()
print(f"Result: {fibonacci(30)}")
print(f"Time taken: {time.time() - start_time:.4f} seconds")

lru_cache is a high-performance solution for most in-memory caching needs, especially for recursive functions or expensive computations. You can inspect cache statistics using fibonacci.cache_info().

2. Manual Caching with Dictionaries

For simpler scenarios or when you need more explicit control, you can implement caching manually using a Python dictionary. This approach gives you full control over when to store, retrieve, or invalidate cached items, but requires more boilerplate code.

python
import time

_cache = {}

def expensive_operation(key):
    if key in _cache:
        return _cache[key] # Return from cache
    
    # Simulate an expensive computation
    time.sleep(1)
    result = f"Data for {key} (computed at {time.time()})"
    
    _cache[key] = result # Store in cache
    return result

print("First call for 'item_a':")
start_time = time.time()
print(expensive_operation('item_a'))
print(f"Time taken: {time.time() - start_time:.4f} seconds\n")

print("Second call for 'item_a' (cached):")
start_time = time.time()
print(expensive_operation('item_a'))
print(f"Time taken: {time.time() - start_time:.4f} seconds\n")

print("First call for 'item_b':")
start_time = time.time()
print(expensive_operation('item_b'))
print(f"Time taken: {time.time() - start_time:.4f} seconds")

This manual approach lacks automatic cache eviction strategies (like LRU) and can grow indefinitely, potentially consuming too much memory if not managed carefully. It's suitable for fixed sets of data or when you implement your own eviction logic.

3. Using the `cachetools` Library

For more advanced caching needs, such as caches with time-based expiry (TTL/TTLCache), maximum size based on actual size (LIFOCache, FIFOCache), or integration with external storage, the cachetools library is an excellent third-party option. You'll need to install it first: pip install cachetools.

python
from cachetools import cached, TTLCache
import time

# Create a cache that expires items after 5 seconds and has a max size of 100 items
my_ttl_cache = TTLCache(maxsize=100, ttl=5)

@cached(cache=my_ttl_cache)
def fetch_external_data(id):
    print(f"Fetching data for ID {id} from external source...")
    time.sleep(2) # Simulate network delay
    return f"Data from API for ID {id} (fetched at {time.time()})"

print("First fetch for ID 1:")
start_time = time.time()
print(fetch_external_data(1))
print(f"Time taken: {time.time() - start_time:.4f} seconds\n")

print("Second fetch for ID 1 (cached):")
start_time = time.time()
print(fetch_external_data(1))
print(f"Time taken: {time.time() - start_time:.4f} seconds\n")

print("Waiting for cache to expire...")
time.sleep(6) # Wait for more than 5 seconds

print("Third fetch for ID 1 (after expiration):")
start_time = time.time()
print(fetch_external_data(1))
print(f"Time taken: {time.time() - start_time:.4f} seconds")

cachetools offers a rich set of cache types, making it suitable for complex caching policies beyond what functools.lru_cache provides out-of-the-box. It can also be used as a decorator similar to lru_cache.

Choosing the Right Caching Strategy

  • functools.lru_cache: Best for memoizing pure functions in memory, simple, and built-in.
  • Manual Dictionary: Good for simple, explicit control over a small, fixed set of cached items.
  • cachetools: Use for more advanced in-memory caching requirements like time-based expiration, different eviction policies (FIFO, LIFO), or when you need more fine-grained control than lru_cache.
  • External Caching (e.g., Redis, Memcached): For distributed systems, larger datasets, or persistent caching across application restarts, consider dedicated caching servers.