Explain shallow vs deep copy internally.
In Python, when working with mutable objects like lists, dictionaries, or custom classes, simply assigning one variable to another creates a reference, leading to aliasing. This means both variables point to the same object in memory. To create independent copies, Python offers two primary methods: shallow copy and deep copy, each with distinct internal mechanisms and implications for nested objects.
The Problem: Aliasing
When you assign an object to a new variable, you are not creating a new object; you are merely creating a new reference to the *same* object. This means changes made through one variable will be visible through the other, especially problematic for mutable container objects.
original_list = [1, 2, [3, 4]]
aliased_list = original_list
aliased_list[0] = 99
aliased_list[2].append(5)
# original_list is now [99, 2, [3, 4, 5]]
# Both variables refer to the exact same list in memory.
Shallow Copy
A shallow copy creates a new compound object, but then (to populate the new object) it inserts *references* into it to the objects found in the original. This means that if the original object contains references to other objects (e.g., a list containing other lists), only the top-level container is copied; the nested objects are still shared between the original and the copy.
Methods for shallow copying include list.copy(), dict.copy(), set.copy(), slicing ([:]), and copy.copy() from the copy module.
import copy
original = [1, 2, [3, 4]]
shallow_copied = copy.copy(original) # or original.copy(), original[:]
print(f"Original: {original}, ID: {id(original)}")
print(f"Shallow Copy: {shallow_copied}, ID: {id(shallow_copied)}")
print(f"Inner list Original ID: {id(original[2])}")
print(f"Inner list Shallow Copy ID: {id(shallow_copied[2])}") # Same ID
# Modifying a top-level immutable element in shallow_copied affects only shallow_copied
shallow_copied[0] = 99
print(f"\nAfter modifying top-level element:")
print(f"Original: {original}") # [1, 2, [3, 4]]
print(f"Shallow Copy: {shallow_copied}") # [99, 2, [3, 4]]
# Modifying a nested mutable object in shallow_copied affects both
shallow_copied[2].append(5)
print(f"\nAfter modifying nested object:")
print(f"Original: {original}") # [1, 2, [3, 4, 5]]
print(f"Shallow Copy: {shallow_copied}") # [99, 2, [3, 4, 5]]
Deep Copy
A deep copy, in contrast, creates a new compound object and then, recursively, inserts *copies* of the objects found in the original into the new object. This means that a deep copy creates completely independent duplicates of all objects, including nested ones. Changes to the deep copy (or original) will not affect the other, ensuring full independence.
The only standard way to perform a deep copy is using copy.deepcopy() from the copy module.
import copy
original = [1, 2, [3, 4]]
deep_copied = copy.deepcopy(original)
print(f"Original: {original}, ID: {id(original)}")
print(f"Deep Copy: {deep_copied}, ID: {id(deep_copied)}")
print(f"Inner list Original ID: {id(original[2])}")
print(f"Inner list Deep Copy ID: {id(deep_copied[2])}") # Different ID
# Modifying a top-level element
deep_copied[0] = 99
print(f"\nAfter modifying top-level element:")
print(f"Original: {original}") # [1, 2, [3, 4]]
print(f"Deep Copy: {deep_copied}") # [99, 2, [3, 4]]
# Modifying a nested mutable object in deep_copied affects only deep_copied
deep_copied[2].append(5)
print(f"\nAfter modifying nested object:")
print(f"Original: {original}") # [1, 2, [3, 4]]
print(f"Deep Copy: {deep_copied}") # [99, 2, [3, 4, 5]]
Internal Mechanism Summary
Shallow Copy Internals
- A new top-level container object is created.
- For elements within the original object, their *references* are copied to the new container. This means the new container points to the same inner objects as the original.
- If an element is an immutable type (e.g., int, string, tuple), copying its reference is effectively the same as copying its value because the value itself cannot change. Both the original and the copy will refer to the same immutable object.
- If an element is a mutable container (e.g., a nested list or dictionary), both the original and the shallow copy will contain references to the *exact same* nested mutable object. Changes to this nested object will be reflected in both.
Deep Copy Internals
- A new top-level container object is created.
- The
deepcopy()function recursively traverses the entire original object graph. - For each object encountered during traversal, it creates a *new independent copy* of that object.
- This process continues until all nested mutable objects have been duplicated, ensuring no shared references remain between the original and the deep copy at any level.
deepcopy()also internally manages a memoization dictionary to handle circular references (where an object refers back to itself or an ancestor), preventing infinite recursion and ensuring only one copy of each unique object is made.
In essence, a shallow copy is 'one level deep', creating a new container but sharing nested contents. A deep copy is 'all levels deep', meticulously duplicating the entire object structure to ensure complete independence.