What is database replication?
Database replication is the process of creating and maintaining multiple copies of a database, typically on different servers. This ensures data consistency and availability across these servers, with changes made to one database being propagated to the others.
What is Database Replication?
At its core, replication involves copying data from a primary database (often called the 'master' or 'source') to one or more secondary databases (referred to as 'replicas' or 'slaves'). This process helps maintain identical or near-identical datasets across multiple locations or servers, allowing for improved system resilience and performance.
Why is Replication Used?
Database replication serves several critical purposes in modern data management:
- High Availability (HA): If the primary server fails, a replica can quickly take over, minimizing downtime.
- Disaster Recovery (DR): Replicas can be geographically dispersed, providing data redundancy and protection against site-specific disasters.
- Scalability: Read-heavy applications can distribute their read queries across multiple replicas, offloading the primary server and improving overall performance.
- Performance: By allowing local access to data or distributing workloads, replication can reduce latency and improve response times for users in different regions.
- Reporting and Analytics: Replicas can be used for running analytical queries or generating reports without impacting the performance of the primary production database.
Common Replication Architectures
- Master-Slave (Primary-Replica): A single primary database handles all write operations, and changes are replicated to one or more secondary replicas that primarily handle read operations. This is the most common setup.
- Master-Master (Multi-Primary): Two or more databases can accept write operations, and changes are replicated bidirectionally between them. This architecture is more complex to manage, especially regarding conflict resolution.
- Group Replication: A cluster of servers forms a 'group' where all servers are kept in sync. This provides high availability and fault tolerance, often with a primary server coordinating writes and distributing them to the group.
Replication Methods
- Synchronous Replication: A transaction is considered complete only after it has been committed to both the primary and all replicas. This ensures maximum data consistency but can introduce latency.
- Asynchronous Replication: A transaction is committed on the primary first, and then changes are propagated to replicas. This offers better performance but there might be a slight delay (replication lag) during which replicas are not perfectly up-to-date.
- Semi-Synchronous Replication: A hybrid approach where the primary waits for at least one replica to acknowledge receipt of the transaction before committing. This balances consistency and performance.
Key Concepts
- Primary/Master: The database that accepts all write operations.
- Replica/Slave: A copy of the primary database that receives replicated data and typically handles read-only queries.
- Replication Lag: The delay between a transaction being committed on the primary and its application on a replica.
- Binary Log (BinLog)/Write-Ahead Log (WAL): A record of all data modifications in the database, used by replicas to apply changes.
In summary, database replication is a fundamental technique for building robust, scalable, and highly available database systems, crucial for modern applications that demand continuous operation and efficient data access.