What is dirty read?
A dirty read, also known as an uncommitted dependency, occurs in database management systems when a transaction reads data that has been modified by another transaction but has not yet been committed. If the modifying transaction later rolls back, the data read by the first transaction becomes invalid or 'dirty'.
Understanding Dirty Reads
Dirty reads are a common concurrency problem that can arise in transactional systems, particularly when the database's isolation level is set too low (e.g., READ UNCOMMITTED). They represent a violation of transaction isolation properties, specifically the 'Isolation' (I) in ACID principles, as one transaction can observe intermediate, uncommitted states of another.
The core issue is that the reading transaction sees a temporary state of the data. If the transaction that made the changes fails or decides to undo its changes (rollback), the data that was read will no longer reflect the final, consistent state of the database. This can lead to an application acting upon incorrect information.
Example Scenario
Consider a simple Accounts table with an account_id and balance. Two concurrent transactions, T1 and T2, are operating on the same account.
1. Transaction T1 starts: Updates balance for account_id = 123 from 1000 to 900. This change is not yet committed.
BEGIN TRANSACTION;
UPDATE Accounts SET balance = 900 WHERE account_id = 123;
2. Transaction T2 starts: Reads the balance for account_id = 123. Because of a low isolation level (like READ UNCOMMITTED), T2 sees the uncommitted change made by T1.
BEGIN TRANSACTION;
SELECT balance FROM Accounts WHERE account_id = 123;
-- T2 reads 900 (the uncommitted change)
3. Transaction T1 rolls back: Due to an error or business logic, T1 decides to undo its change. The balance for account_id = 123 reverts to its original value of 1000.
ROLLBACK;
-- balance for account_id = 123 is now 1000
In this scenario, T2 performed a dirty read. It read the balance as 900, but because T1 rolled back, the actual balance in the database remained 1000. T2 based its subsequent operations or decisions on inaccurate data (900), leading to potential inconsistencies.
Impact and Consequences
- Inaccurate Data: Applications may process or display data that never truly existed in the committed state of the database.
- Application Errors: Subsequent logic based on dirty data can lead to incorrect calculations, invalid state transitions, or even system crashes.
- Loss of Data Integrity: If not managed, dirty reads can compromise the overall consistency and reliability of the database.
Preventing Dirty Reads
The primary method to prevent dirty reads is by setting an appropriate transaction isolation level. Database systems offer various isolation levels, each providing different degrees of protection against concurrency anomalies.
- READ COMMITTED: This is a common default isolation level for many databases. It prevents dirty reads by ensuring that a transaction can only read data that has been committed by another transaction.
- REPEATABLE READ: Stronger than
READ COMMITTED, it prevents dirty reads and non-repeatable reads. - SERIALIZABLE: The highest isolation level, it prevents all standard concurrency anomalies, including dirty reads, non-repeatable reads, and phantom reads. It achieves full transactional isolation but can incur higher performance overhead.
-- Example of setting isolation level for a session or transaction
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Or for a specific transaction (syntax may vary by DB)
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
Summary
A dirty read occurs when a transaction reads data that has been written by another transaction but not yet committed. If the writing transaction then rolls back, the data read becomes invalid. This is a critical data consistency issue typically prevented by configuring the database's transaction isolation level to READ COMMITTED or higher, ensuring that only committed data is visible to other transactions.