🐍 Python Q34 / 170

Difference between threading and multiprocessing?

AI-Powered Answer ✓ Answered

In Python, both `threading` and `multiprocessing` modules allow you to run tasks concurrently, but they achieve this in fundamentally different ways, each with its own advantages and disadvantages, especially due to Python's Global Interpreter Lock (GIL).

Core Concepts

Concurrency in Python can be achieved through both threads and processes. While threads operate within a single process and share memory, processes are independent units with their own memory space.

Understanding the Global Interpreter Lock (GIL)

The Python Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecodes at once. This means that even on multi-core processors, only one thread can execute Python bytecode at any given time. This is the primary reason why Python threads are not suitable for CPU-bound tasks, as they cannot achieve true parallelism in such scenarios.

Threading (python.threading)

Python's threading module allows you to create and manage threads. Threads run within the same process, sharing the same memory space. While this allows for easy data sharing between threads, the GIL significantly limits their ability to achieve true parallelism for CPU-bound tasks. Threads are best suited for I/O-bound tasks where the thread spends most of its time waiting for external resources (e.g., network requests, disk I/O), during which the GIL can be released, allowing other threads to potentially run.

Multiprocessing (python.multiprocessing)

The multiprocessing module allows you to spawn multiple processes, each with its own Python interpreter and memory space. Because each process has its own GIL, they can run truly in parallel on multi-core machines, making multiprocessing ideal for CPU-bound tasks. Data sharing between processes requires explicit mechanisms like pipes or queues, as they don't share memory by default.

Key Differences Summarized

Feature	Threading	Multiprocessing
Execution Model	Multiple threads within one process	Multiple independent processes
Memory	Shared memory space (requires careful synchronization)	Separate memory space per process (data transfer via IPC)
GIL Impact	Significant (limits true parallelism for CPU-bound tasks due to single GIL)	Each process has its own GIL, allowing true parallelism on multi-core systems
Overhead	Lower startup and memory overhead	Higher startup and memory overhead
Data Sharing	Easier (shared memory, but prone to race conditions if not locked)	More complex (Inter-Process Communication like pipes, queues, shared memory segments)
Error Isolation	An error (unhandled exception) in one thread can affect the whole process	Processes are isolated; an error in one typically doesn't crash others
Best Use Case	I/O-bound tasks (network operations, file I/O, database queries)	CPU-bound tasks (heavy computations, data processing, numerical algorithms)

When to Use Which

Use Threading for I/O-bound tasks: If your program spends most of its time waiting for external events (e.g., reading from a network socket, disk I/O, database queries), threading can improve responsiveness by allowing other threads to run while one thread is waiting.
Use Multiprocessing for CPU-bound tasks: If your program performs heavy computations that utilize the CPU continuously (e.g., complex calculations, image processing, data analysis), multiprocessing will leverage multiple CPU cores to achieve true parallelism and significantly speed up execution.

Simple Example: CPU-bound Task

Let's illustrate with a simple CPU-bound task (a busy loop) to highlight the difference in potential speedup on multi-core systems.

Threading Example

python

import threading
import time

def cpu_bound_task(n):
    result = 0
    for _ in range(n):
        result += 1

print("\n--- Threading Example ---")
start_time = time.time()

thread1 = threading.Thread(target=cpu_bound_task, args=(5 * 10**7,))
thread2 = threading.Thread(target=cpu_bound_task, args=(5 * 10**7,))

thread1.start()
thread2.start()

thread1.join()
thread2.join()
end_time = time.time()
print(f"Threading with 2 threads took {end_time - start_time:.4f} seconds")

# For comparison, single-threaded
start_time_single = time.time()
cpu_bound_task(10 * 10**7)
end_time_single = time.time()
print(f"Single-threaded took {end_time_single - start_time_single:.4f} seconds")

Multiprocessing Example

python

import multiprocessing
import time

def cpu_bound_task(n):
    result = 0
    for _ in range(n):
        result += 1

print("\n--- Multiprocessing Example ---")
start_time = time.time()

process1 = multiprocessing.Process(target=cpu_bound_task, args=(5 * 10**7,))
process2 = multiprocessing.Process(target=cpu_bound_task, args=(5 * 10**7,))

process1.start()
process2.start()

process1.join()
process2.join()
end_time = time.time()
print(f"Multiprocessing with 2 processes took {end_time - start_time:.4f} seconds")

When running these examples, you'll typically observe that the multiprocessing version completes significantly faster for CPU-bound tasks on multi-core machines, often close to half the time of a single-threaded execution for two processes. In contrast, the threading version will be only marginally faster than a single-threaded execution, if at all, for CPU-bound tasks, illustrating the impact of the GIL.

← All Python questions