What is multiprocessing module?
The `multiprocessing` module in Python is a package that supports spawning processes using an API similar to the `threading` module. It allows the programmer to fully leverage multiple processors on a given machine, bypassing the Global Interpreter Lock (GIL) limitations that affect multi-threaded programs.
What is the `multiprocessing` module?
The multiprocessing module is designed for parallel execution of code in Python. Unlike threading, which creates multiple threads within a single process, multiprocessing creates separate processes. Each process has its own Python interpreter and memory space, making it immune to the Global Interpreter Lock (GIL) and capable of utilizing multiple CPU cores simultaneously for true parallelism.
Key Features and Components
Processclass: The fundamental building block for creating new processes.Poolclass: Provides a convenient way to parallelize the execution of a function across multiple input values, distributing the inputs among worker processes.QueueandPipe: Used for inter-process communication (IPC) to exchange data between processes.Lock,Semaphore,Event: Synchronization primitives similar to those in thethreadingmodule, adapted for inter-process synchronization.ValueandArray: Shared memory objects to allow processes to share simple data types or arrays.Manager: Provides a way to create shared objects that can be accessed by different processes, allowing them to share more complex data structures like dictionaries or lists.
Why use `multiprocessing`?
- True Parallelism: Bypasses the GIL, allowing CPU-bound tasks to run on multiple CPU cores concurrently.
- Improved Performance: Can significantly speed up computations that can be divided into independent parts.
- Isolation: Each process has its own memory space, leading to greater stability as errors in one process are less likely to affect others.
- Resource Utilization: Effectively uses multi-core processors, which are standard in modern computers.
Basic Example
Here's a simple example demonstrating how to create and run processes using the Process class.
import multiprocessing
import os
def worker_function(name):
print(f"Worker {name}: My process ID is {os.getpid()}")
if __name__ == '__main__':
print(f"Main process ID: {os.getpid()}")
# Create two Process objects
p1 = multiprocessing.Process(target=worker_function, args=('Alice',))
p2 = multiprocessing.Process(target=worker_function, args=('Bob',))
# Start the processes
p1.start()
p2.start()
# Wait for both processes to complete
p1.join()
p2.join()
print("All worker processes finished.")
Multiprocessing vs. Multithreading
| Feature | `multiprocessing` | `threading` |
|---|---|---|
| Parallelism | True parallelism (bypasses GIL) | Concurrency, but not true parallelism for CPU-bound tasks (limited by GIL) |
| Memory | Each process has its own memory space | Threads share the same memory space |
| Overhead | Higher overhead (process creation, IPC) | Lower overhead (thread creation, context switching) |
| IPC | Requires explicit IPC mechanisms (Queues, Pipes, Managers) | Easier data sharing, but requires careful synchronization |
| Best for | CPU-bound tasks (e.g., heavy computations, data processing) | I/O-bound tasks (e.g., network requests, file operations) |
| Robustness | More robust; failure in one process typically doesn't affect others | Less robust; error in one thread can crash the entire process |